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Preface 


Intended Audience 


This guide is intended for system managers who install and use HP Availability 
Manager software. It is assumed that the system managers who use this product 
are familiar with Microsoft Windows terms and functions. 


Note 


The term Windows as it is used in this manual refers to either Windows 
2000 or Windows XP but not to any other Windows product. 


Document Structure 


This guide contains the following chapters and appendixes: 


Chapter 1 provides an overview of Availability Manager software, including 
security features. 


Chapter 2 tells how to start the Availability Manager, use the main System 
Overview window, select a group of nodes and individual nodes, and use 
online help. 


Chapter 3 tells how to select nodes and display node data; it also explains 
what node data is. 


Chapter 4 tells how to display OpenVMS Cluster summary and detailed data; 
it also explains what cluster data is. 


Chapter 5 tells how to display and interpret events. 


Chapter 6 tells how to take a variety of corrective actions, called fixes, to 
improve system availability. 


Chapter 7 describes the tasks you can perform to filter, select, and customize 
the display of data and events. 


Appendix A contains a table of CPU process states that are referred to in 
Section 3.2.2.4 and in Section 3.3.1. 


Appendix B contains a table of OpenVMS and Windows events that can be 
displayed in the Event pane discussed in Chapter 5. 


Appendix C describes the events that can be signaled for each type of 
OpenVMS data that is collected. 
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Related Documents 
The following manuals provide additional information: 


e HP OpenVMS System Manager’s Manual describes tasks for managing 
an OpenVMS system. It also describes installing a product with the 
POLYCENTER Software Installation utility. 


e HP OpenVMS System Management Utilities Reference Manual describes 
utilities you can use to manage an OpenVMS system. 


e HP OpenVMS Programming Concepts Manual explains OpenVMS lock 
management concepts. 


For additional information about HP OpenVMS products and services, visit the 
following World Wide Web address: 


http: //www.hp.com/go/openvms 


Reader’s Comments 


HP welcomes your comments on this manual. Please send comments to either of 
the following addresses: 


Internet openvmsdoc@hp.com 


Postal Mail Hewlett-Packard Company 
OSSG Documentation Group, ZKO3-4/U08 
110 Spit Brook Rd. 
Nashua, NH 03062-2698 


How to Order Additional Documentation 


For information about how to order additional documentation, visit the following 
World Wide Web address: 


http: //www.hp.com/go/openvms/doc/order 


Conventions 
The following conventions are used in this guide: 


Ctrl/x A sequence such as Ctrl/x indicates that you must hold down 
the key labeled Ctrl while you press another key or a pointing 
device button. 


PF1 x A sequence such as PF 1 x indicates that you must first press 
and release the key labeled PF1 and then press and release 
another key or a pointing device button. 


Return In examples, a key name enclosed in a box indicates that 
you press a key on the keyboard. (In text, a key name is not 
enclosed in a box.) 


In the HTML version of this document, this convention appears 
as brackets, rather than a box. 


XiV 


() 


{} 


bold type 


italic type 


UPPERCASE TYPE 


Example 


numbers 


A horizontal ellipsis in examples indicates one of the following 
possibilities: 


e Additional optional arguments in a statement have been 
omitted. 


e The preceding item or items can be repeated one or more 
times. 


e Additional parameters, values, or other information can be 
entered. 


A vertical ellipsis indicates the omission of items from a code 
example or command format; the items are omitted because 
they are not important to the topic being discussed. 


In command format descriptions, parentheses indicate that you 
must enclose choices in parentheses if you specify more than 
one. 


In command format descriptions, brackets indicate optional 
choices. You can choose one or more items or no items. 

Do not type the brackets on the command line. However, 
you must include the brackets in the syntax for OpenVMS 
directory specifications and for a substring specification in an 
assignment statement. 


In command format descriptions, vertical bars separate choices 
within brackets or braces. Within brackets, the choices are 
optional; within braces, at least one choice is required. Do not 
type the vertical bars on the command line. 


In command format descriptions, braces indicate required 
choices; you must choose at least one of the items listed. Do 
not type the braces on the command line. 


Bold type represents the introduction of a new term. It also 
represents the name of an argument, an attribute, or a reason. 


Italic type indicates important information, complete titles 
of manuals, or variables. Variables include information that 
varies in system output (Internal error number), in command 
lines (PRODUCER=name), and in command parameters in 
text (where dd represents the predefined code for the device 
type). 


Uppercase type indicates a command, the name of a routine, 
the name of a file, or the abbreviation for a system privilege. 


This typeface indicates code examples, command examples, and 
interactive screen displays. In text, this type also identifies 
URLs, UNIX commands and pathnames, PC-based commands 
and folders, and certain elements of the C programming 
language. 


A hyphen at the end of a command format description, 
command line, or code line indicates that the command or 
statement continues on the following line. 


All numbers in text are assumed to be decimal unless 
otherwise noted. Nondecimal radixes—binary, octal, or 
hexadecimal—are explicitly indicated. 
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Overview 


This chapter answers the following questions: 

e What is the HP Availability Manager? 

e How does the Availability Manager work? 

e How does the Availability Manager maintain security? 


e How does the Availability Manager identify possible performance problems? 


1.1 What Is the HP Availability Manager? 


The HP Availability Manager is a system management tool that allows you to 
monitor, from an OpenVMS or Windows node, one or more OpenVMS nodes on an 
extended local area network (LAN). 


The Availability Manager helps system managers and analysts target a specific 
node or process for detailed analysis. This tool collects system and process data 
from multiple OpenVMS nodes simultaneously, analyzes the data, and displays 
the output using a graphical user interface (GUI). 


Features and Benefits 


The Availability Manager offers many features that can help system managers 
improve the availability, accessibility, and performance of OpenVMS nodes and 
clusters. 


Feature Description 


Immediate notification Based on its analysis of data, the Availability Manager notifies 

of problems you immediately if any node you are monitoring is experiencing 
a performance problem, especially one that affects the node’s 
accessibility to users. At a glance, you can see whether a 
problem is a persistent one that warrants further investigation 
and correction. 


Centralized Provides centralized management of remote nodes within an 
management extended local area network (LAN). 
Intuitive interface Provides an easy-to-learn and easy-to-use graphical user 


interface (GUI). An earlier version of the tool, DECamds, uses 
a Motif GUI to display information about OpenVMS nodes. The 
Availability Manager uses a Java GUI to display information 
about OpenVMS nodes on an OpenVMS or a Windows node. 


Correction capability Allows real-time intervention, including adjustment of node and 
process parameters, even when remote nodes are hung. 
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Feature 


Description 


Uses its own protocol 


Customization 


Scalability 


An important advantage of the Availability Manager is that 

it uses its own network protocol. Unlike most performance 
monitors, the Availability Manager does not rely on TCP/IP 

or any other standard protocol. Therefore, even if a standard 
protocol is unavailable, the Availability Manager can continue to 
operate. 


Using a wide range of customization options, you can customize 
the Availability Manager to meet the requirements of your 
particular site. For example, you can change the severity levels 
of the events that are displayed and escalate their importance. 


Makes it easier to monitor multiple OpenVMS nodes. 


Figure 1-1 is an example of the initial System Overview window of the 


Availability Manager. 


Figure 1-1 System Overview Window 
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The System Overview window is divided into the following sections: 


e In the upper section of the display is a list of user-defined groups and a list 
of nodes in each group. You can compress the display to only the name of a 
group by clicking the handle preceding the group name. The summary group 
line remains, showing the collected information for all the nodes in the group. 
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If a node name displays a red icon, you can hold the cursor over the icon, 
the node name, or the number in the Events column to display a tooltip 
explaining what the problem is; for example, for the node DBGAVC, the 
following message is displayed: 


HIHRDP, high hard page fault rate 
This section of the window is called the Group/Node pane. 


e In the lower section of the window events are posted, alerting you to possible 
problems on your system. The items on the pane vary, depending on the 
severity of the problem: the most severe problems are displayed first. This 
section of the window is called the Event pane. 


1.2 How Does the Availability Manager Work? 
The Availability Manager uses two types of nodes to monitor systems: 


e One or more OpenVMS Data Collector nodes, which contain the software that 
collects data. 


e An OpenVMS or a Windows Data Analyzer node, which contains the software 
that analyzes the collected data. 


The Data Analyzer and Data Collector nodes communicate over an extended 
LAN using an IEEE 802.3 Extended Packet format protocol. Once a connection 
is established, the Data Analyzer instructs the Data Collector to gather specific 
system and process data. 


Although you can run the Data Analyzer as a member of a monitored cluster, it 
is typically run on a system that is not a member of a monitored cluster. In this 
way, the Data Analyzer will not hang if the cluster hangs. 


Only one Data Analyzer at a time should be running on each node; however, more 
than one can be running in the LAN at any given time. 


Figure 1-2 shows a possible configuration of Data Analyzer and Data Collector 
nodes. 
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Figure 1-2 Availability Manager Node Configuration 
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In Figure 1-2, the Data Analyzer can monitor nodes A, B, and C across the 
network. The password on node D does not match the password of the Data 
Analyzer; therefore, the Data Analyzer cannot monitor node D. 


For information about password security, see Section 1.3. 


Requesting and Receiving Information 
After installing the Availability Manager software, you can begin to request 
information from one or more Data Collector nodes. 


Requesting and receiving information requires the Availability Manager to 
perform a number of steps, which are shown in Figure 1-3 and explained after 
the figure. 
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Figure 1-3 Requesting and Receiving Information 
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The following steps correspond to the numbers in Figure 1-3. 


@ The GUI communicates users’ requests for data to the driver on the Data 
Analyzer node: 


e On Windows systems, the Windows driver is part of the Windows kit. 


e On OpenVMS systems, the OpenVMS driver is called the Data Collector 
driver and is included in the Data Collector kit. This is the same driver 
that is on the Data Collector node. 


@® The driver on the Data Analyzer sends users’ requests across the network to 
the driver on the Data Collector node. 


© The driver on the Data Collector transmits the requested information over 
the network to the driver on the Data Analyzer node. 


© The driver on the Data Analyzer node passes the requested information to the 
GUI, which displays the data. 


In step 4, the Availability Manager also checks the data for any events that 
should be posted. The following section explains in more detail how data analysis 
and event detection work. 


Note 


More than one Windows or OpenVMS Data Analyzer node can collect data 
from the same Data Collector node. 
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Communicating Through a Private LAN Transport 

The Availability Manager protocol is based on the 802.3 Extended Packet Format 
(also known as SNAP). The IEEE Availability Manager protocol values are as 
follows: 

08-00-2B-80-48 

09-00-2B-02-01-09 


Protocol ID: 
Multicast Address: 


If your routers filter protocols in your network, add these values to your network 
protocols so that the private transport is propagated over the routers. 


1.3 How Does the Availability Manager Maintain Security? 


The Availability Manager uses passwords to maintain security. Passwords are 
eight alphanumeric characters long. The Data Analyzer stores passwords in its 
customization file. On OpenVMS Data Collector nodes, passwords are part of a 
three-part security code called a security triplet. 


The following sections explain these security methods further. 


1.3.1 Data Analyzer Password Security 


For monitoring to take place, the password on a Data Analyzer node must match 
the password section of the security triplet on each OpenVMS Data Collector 
node. OpenVMS Data Collectors also impose other security measures, which are 
explained in Section 1.3.2. 


Figure 1—4 illustrates how you can use passwords to limit access to node 
information. 


Figure 1-4 Availability Manager Password Matching 
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As shown in Figure 1—4, the Testing Department’s Data Analyzer, whose 
password is HOMERUNS, can access only OpenVMS Data Collector nodes with 
the HOMERUNS password as part of their security triplets. The same is true of 
the Accounting Department’s Data Analyzer, whose password is BATTERUP; it 
can access only OpenVMS Data Collector nodes with the BATTERUP password 
as part of their security triplets. 
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The Availability Manager sets a default password when you install the Data 
Analyzer. To change that password, you must use the OpenVMS Security 
Customization page (see Figure 7-21), which is explained in Chapter 7. 


1.3.2 OpenVMS Data Collector Security 
OpenVMS Data Collector nodes have the following security features: 


Availability Manager data-transfer security 


Each OpenVMS node running as a Data Collector has a file containing a list 
of security triplets. For Data Analyzer and Data Collector nodes to exchange 
data, the passwords on these nodes must match. 


In addition, the triplet specifies the type of access a Data Analyzer has. By 
specifying the hardware address of the Data Analyzer, the triplet can also 
restrict which Data Analyzer nodes are able to access the Data Collector. 


Section 1.3.3 explains security triplets and how to edit them. 
Availability Manager security log 


An OpenVMS Data Collector logs all access denials and executed 

write instructions to the operator communications manager (OPCOM). 
Messages are displayed on all terminals that have OPCOM enabled (with 
the REPLY/ENABLE command). OPCOM also puts messages in the 
SYS$MANAGER:OPERATOR.LOG file. 


Each security log entry contains the network address of the initiator. If 
access is denied, the log entry also indicates whether a read or write was 
attempted. If a write operation was performed, the log entry indicates the 
process identifier (PID) of the affected process. 


OpenVMS file protection and process privileges 


When the Availability Manager is installed, it creates a directory 
(SYS$COMMON:|[AMDS$AM]) and sets directory and file protections on 

it so that only the SYSTEM account can read the files in that directory. For 
additional security on these system-level directories and files, you can create 
access control lists (ACLs) to restrict and set alarms on write access to the 
security files. 


For more information about creating ACLs, see the HP OpenVMS Guide to 
System Security. 


1.3.3 Changing Security Triplets on OpenVMS Data Collector Nodes 
To change security triplets on an OpenVMS Data Collector node, you must 
edit the AMDS$DRIVER_ACCESS.DAT file, which is installed on all Data 
Collector nodes. The following sections explain what a security triplet is, how the 
Availability Manager uses it, and how to change it. 


1.3.3.1 Understanding OpenVMS Security Triplets 


A security triplet determines which nodes can access system data from an 
OpenVMS Data Collector node. The AMDS$DRIVER_ACCESS.DAT file on 
OpenVMS Data Collector nodes lists security triplets. 


On OpenVMS Data Collector nodes, the AMDS$AM_CONFIG logical translates to 
the location of the default security file, AMDS$DRIVER_ACCESS.DAT. This file 
is installed on all OpenVMS Data Collector nodes. 
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A security triplet is a three-part record whose fields are separated by backslashes 
(\ ). A triplet consists of the following fields: 


e A network address (hardware address or wildcard character) 
e An 8-character alphanumeric password 


The password is not case sensitive (so the passwords “testtest” and 
“TESTTEST” are considered to be the same). 


e A read, write, or control (R, W, or C) access verification code 


The exclamation point (!) is a comment delimiter; any characters to the right of 
the comment delimiter are ignored. 


Example 


All Data Collector nodes in group FINANCE have the following AMDS$DRIVER_ 
ACCESS.DAT file: 


*\FINGROUP\R  ! Let anyone with FINGROUP password read 
! 


2.1\DEVGROUP\W ! Let only DECnet node 2.1 with 
! DEVGROUP password perform fixes (writes) 


1.3.3.2 How to Change a Security Triplet 
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Note 


The configuration files for DECamds and the Availability Manager are 
separate; only one set is used, depending on which startup command 
procedure you use to start the driver. 


For more information about the configuration file setup for both 
DECamds and the Availability Manager, see the HP Availability Manager 
Installation Guide. 


On each Data Collector node on which you want to change security, you must 
edit the AMDS$DRIVER_ACCESS.DAT file. The data in the AMDS$DRIVER_ 
ACCESS.DAT file is set up as follows: 


Network address\password\access 
Use a backslash character (\ ) to separate the three fields. 
To edit the AMDS$DRIVER_ACCESS.DAT file, follow these steps: 
1. Edit the network address. 
The network address can be either of the following: 


e Hardware address 


The hardware address field is the physical hardware address in the LAN 
device chip. It is used if you have multiple LAN devices or are running 
the HP DECnet-Plus for OpenVMS networking software on the system 
(not the HP DECnet Phase IV for OpenVMS networking software). 


For devices provided by HP, the hardware address is in the form 08- 
00-2B-«x-xx-xx, where the 08-00-2B portion is HP’s valid range of LAN 
addresses as defined by the IKEE 802 standards, and the xx-xx-xx portion 
is chip specific. 
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To determine the value of the hardware address on a node, use the 
OpenVMS System Dump Analyzer (SDA) as follows: 


$ ANALYZE/SYSTEM 
SDA> SHOW LAN 


These commands display a list of available devices. Choose the template 
device of the LAN device you will be using, and then enter the following 
command: 


SDA> SHOW LAN/DEVICE=xxA0 
Wildcard address 


The wildcard character (*) allows any incoming triplet with a matching 
password field to access the Data Collector node. Use the wildcard 
character to allow read access and to run the console application from any 
node in your network. 


Because the Data Analyzer does not use this field, use the wildcard 
character in this field in the AMDS$CONSOLE_ACCESS.DAT file. 


Caution: Use of the wildcard character for write-access security triplets 
enables any person using that node to perform system-altering fixes. 


2. Edit the password field. 


The password field must be an 8-byte alphanumeric field. The Availability 
Manager forces upper-case on the password, so "aaaaaaaa" and "AAAAAAAA" 
are essentially the same password to the Data Collector. 


The password field gives you a second level of protection when you want to 
use the wildcard address denotation to allow multiple modes of access to your 
monitored system. 


3. Enter R, W, or C as an access code: 


R means READONLY access to the Data Analyzer. 


W means READ/WRITE access to the Data Analyzer. (WRITE implies 
READ.) 


C means CONTROL access to the Data Analyzer. CONTROL allows you 
to manipulate objects from which data are derived. (CONTROL implies 
both WRITE and READ.) 


The following security triplets are all valid; an explanation follows the 
exclamation point (!). 


*\1decamds\r 
*\1decamds\w 


Anyone with password "ldecamds" can monitor 
Anyone with password "Idecamds" can monitor or write 


! 
| 

2.1\ldecamds\r ! Only node 2.1 with password "ldecamds" can monitor 
| 


2.1\1decamds\w 


write 


Only node 2.1 with password "ldecamds" can monitor and 


08-00-2b-03-23-cd\ldecamds\w ! Allows a particular hardware address to 


write 


08-00-2b-03-23-cd\ldecamds\r ! Allows a particular hardware address to read 


node 


OpenVMS Data Collector nodes accept more than one password. Therefore, you 
might have several security triplets in an AMDS$DRIVER_ACCESS.DAT file for 
one Data Collector node. For example: 
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*\ 1DECAMDS\R 
*\KOINECLS\R 
*\ KOINEFIX\W 
*\ AVAILMAN\C 


In this example, Data Analyzer nodes with the passwords IDECAMDS and 
KOINECLS are able to see the Data Collector data, but only the Data Analyzer 
node with the KOINEFIX password is able to write or change information, 
including performing fixes, on the Data Collector node. The Data Analyzer node 
with the AVAILMAN password is able to perform switched LAN fixes and other 
control functions. 


You can choose to set up your AMDS$DRIVER_ACCESS.DAT file to allow anyone 
on the local LAN to read from your system, but to allow only certain nodes to 
write or change process or device characteristics on your system. For example: 


*\ 1DECAMDS\R 
08-00-2B-03-23-CD\2NODEFIX\C 


In this example, any Data Analyzer node using the 1DECAMDS password can 
read data from your system. However, only the Data Analyzer node with the 
hardware address 08-00-2B-03-23-CD and the password 2NODEFIX can perform 
fixes and other control functions. 


Note 


After editing the AMDS$DRIVER_ACCESS.DAT file, you must stop and 
then restart the Data Collector. This action loads the new data into the 
driver. (See Section 2.1.3.) 


1.3.4 Processing Security Triplets 
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The Availability Manager performs these steps when using security triplets to 
ensure security among Data Analyzer and Data Collector nodes: 


1. A message is broadcast at regular intervals to all nodes within the LAN 
indicating the availability of a Data Collector node to communicate with a 
Data Analyzer node. 


2. The node running the Data Analyzer receives the message, returns a 
password to the Data Collector, and requests system data from the Data 
Collector. 


3. The password and network address of the Data Analyzer are used to search 
the security triplets in the AMDS$DRIVER_ACCESS.DAT file. 


e Ifthe Data Analyzer password and network address match one of the 
security triplets on the Data Collector, then the Data Collector and the 
Data Analyzer can exchange information. 


e Ifthe Data Analyzer password and network address do not match any 
of the security triplets, then access is denied and a message is logged 
to OPCOM. (See Table 1-2 for more information on logging this type of 
message.) In addition, the Data Analyzer receives a message stating that 
access to that node is not permitted. 
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Table 1-1 describes how the Data Collector node interprets a security triplet 
match. 


Table 1-1 Security Triplet Verification 


Security Triplet Interpretation 


08-00-2B-12-34-56\ HOMETOWN \ W The Data Analyzer has write access to the node 
only when the Data Analyzer is run from a node 
with this hardware address (multiadapter or 
DECnet-Plus system) and with the password 
HOMETOWN. 


2.1\ HOMETOWN\R The Data Analyzer has read access to the 
node when run from a node with DECnet 
for OpenVMS Phase IV address 2.1 and the 
password HOMETOWN. 


*\ HOMETOWN\R Any Data Analyzer with the password 
HOMETOWN has read access to the node. 


Sending Messages to OPCOM 


The logical names shown in Table 1—2 control the sending of messages to OPCOM 
and are defined in the AMDS$LOGICALS.COM file on the Data Collector node. 


Table 1-2  DECamds Logical Names for OPCOM Messages 


AMDS$RM_OPCOM_READ A value of TRUE logs read failures to OPCOM. 
AMDS$RM_OPCOM_WRITE A value of TRUE logs write failures to OPCOM. 


To put these changes into effect, restart the Data Collector with the following 
command: 


S$ @SYSSSTARTUP:AMDSSSTARTUP RESTART 


1.4 How Does the Availability Manager Identify Performance 
Problems? 


When the Availability Manager detects problems on your system, it uses a 
combination of methods to bring these problems to the attention of the system 
manager. It examines both the types of data collected and how often it is collected 
and analyzed to determine problem areas to be signaled. Performance problems 
are also posted in the Event pane, which is in the lower portion of the System 
Overview window (Figure 1-1). 


The following topics are related to the method of detecting problems and posting 
events: 


e Collecting and analyzing data 
e Posting events 


1.4.1 Collecting and Analyzing Data 


This section explains how the Availability Manager collects and analyzes data. It 
also defines related terms. 


Overview 1-11 


Overview 


1.4 How Does the Availability Manager Identify Performance Problems? 


1.4.1.1. Events and Data Collection 


The data that the Availability Manager collects is grouped into data collections. 
These collections are composed of related data—for example, CPU data, memory 
data, and so on. Usually, the data items on the tabs (like the ones displayed in 
Figure 1-5) consist of one data collection. 


Figure 1-5 Sample Node Summary 
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An event is a problem or potential problem associated with resource availability. 
Events are associated with various data collections. For example, the CPU 
Process data collection shown in Figure 1-6 is associated with the PRCCUR, 
PRCMWT, and PRCPWT events. (Appendix B describes events, and Appendix C 
describes the events that each type of data collection can signal.) For these events 
to be signalled, you must enable the CPU Process data collection, as described in 
Section 1.4.1.2. 


Users can also customize criteria for events, which is described in Section 1.4.2. 


1.4.1.2 Types of Data Collection 


1-12 Overview 


You can use the Availability Manager to collect data either as a background 
activity or as a foreground activity. 


Note that for either type of data collection, if you collect data for a specific node, 
only that node is affected. If you collect data for a group, all the nodes in that 
group are affected. 


e Background data collection 


When you enable background collection of a specific type of data collection on 
a specific node, the Availability Manager collects that data whether or not any 
windows are currently displaying data for that node. 
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To enable background data collection, select the check box for a specific type 
of data collection on the Data Collection Customization page (Figure 1-6). 
Note that if the Customize window applies to all OpenVMS nodes, the data 
collection properties that you set are for all nodes. If the window applies to a 
specific node, the properties you set apply only to that node. 


Chapter 7 contains additional instructions for customizing data collection 
properties. 


Figure 1-6 Data Collection Customization 


Customization - Open¥MS Default Settings es| 


—<—e Collection  Securiy | 


Data Collection & Update Intervals Use default values 


Collect D 
io Clustersummary 10.0 20.0 60.0 
oH CPU mode 5.0 6.0 5.0 
i CPU process 5.0 10.0 30.0 
vi © Disk status 15.0 15.0 60.0 
5 iDiskvoume 150 18.08 
i io 10.0 10.0 30.0 
a Lock contention 10.0 20.0 60.0 
a Memory 5.0 10.0 30.0 
ra Node summary 5.0 5.0 5.0 
a Page/Swap file 30.0 30.0 60.0 
v Single disk 5.0 NIA NIA 
Vv) Single process 5.0 NA NIA 
Icons are used to indicate the current customization level in effect. = 
fill Indicates the current settings are from the Availability Manager 
built-in set. 
c Indicates the current settings are from the Application level = 


© Global OpenVMS | ok || cancel | Appy | Hein 


Foreground data collection 


Foreground data collection occurs automatically when you open any data page 
for a specific node. To open a node data page, double-click a node name in the 
Node pane of the System Overview window (Figure 1-1). The Node Summary 
page is the first page displayed (by default); Figure 1-5 is an example. At the 
top of the page are tabs that you can select to display other data pages for 
that node. 


Foreground data collection for all data types begins automatically when any 
node data page is displayed. Data collection ends when all node data pages 
have been closed. 


Chapter 3 contains instructions for selecting nodes and displaying node data. 
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1.4.1.3 Data Collection Intervals 

Data collection intervals, which are displayed on the Data Collection 
customization page (Figure 1-6), specify the frequency of data collection. 
Table 1-3 describes these intervals. 


Table 1-3 Data Collection Intervals 


Interval (in 
seconds) 


Type of Data 
Collection 


Description 


NoEvent 


Event 


Display 


Background 


Background 


Foreground 


How often data is collected if no events have been posted for that type 
of data. 


The Availability Manager starts background data collection at the 
NoEvent interval (for example, every 75 seconds). If no events have 
been posted for that type of data, the Availability Manager starts a new 
collection cycle every 75 seconds. 


How often data is collected if any events have been posted for that type 
of data. 


The Availability Manager continues background data collection at the 
Event interval until all events for that type of data have been removed 
from the Event pane. Data collection then resumes at the NoEvent 
interval. 


How often data is collected when the page for a specific node is open. 


The Availability Manager starts foreground data collection at the 
Display interval and continues this rate of collection until the display 
is closed. Data collection then resumes as a background activity. 


1.4.2 Posting Events 


The Availability Manager evaluates each data collection for events. The 
Availability Manager posts events when data values in a data collection meet 

or exceed user-defined thresholds and occurrences. Values for thresholds and 
occurrences are displayed on Event Customization pages similar to the one shown 
in Figure 1-7. Thresholds and occurrences are described in the next section. 
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Figure 1-7 Sample Event Customization 
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1.4.2.1. Thresholds and Occurrences 


Thresholds and occurrences are criteria that the Availability Manager uses for 
posting events. 


A threshold is a value against which data in a data collection is compared. An 
occurrence is a value that represents the number of consecutive data collections 
that meet or exceed the threshold. 


Both thresholds and occurrences are customizable values that you can adjust 
according to the needs of your system. For details about how to change the values 
for thresholds and occurrences, see Chapter 7. 


Relationship Between Thresholds and Occurrences 


For a particular event, when the data collected meet or exceed the threshold, 
the data collection enters a threshold-exceeded state. When the number of 
consecutive data collections to enter this state meets or exceeds the value in the 
Occurrence box (see Figure 1—7), the Availability Manager displays (posts) the 
event in the Event pane. 


A closer look at Figure 1-7 shows the relationship between thresholds and 
occurrences. For the DSKERR, high disk device error count event, a threshold 
of 15 errors has been set. A value of 2 in the Occurrence box indicates that the 
number of errors during 2 consecutive data collections must meet or exceed the 
threshold of 15 for the DSKERR event to be posted. 
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Note 


Before you start this chapter, be sure to read the explanation of data 
collections, events, thresholds, and occurrences, as well as background 
and foreground data collection in Chapter 1. 


This chapter provides the following information: 

e How to configure and start the Availability Manager Data Collector 
e How to start the Availability Manager Data Analyzer 

e How to use the main System Overview window 

e How to display basic node data 

e How to get help when you need it 

e How to print an Availability Manager page 


For information about installing the HP Availability Manager on OpenVMS or 
Windows systems, see the HP Availability Manager Installation Instructions. You 
can access these instructions from the documentation link at the Availability 
Manager web page at the following URL: 


http: //www.hp.com/products/openvms/availabilitymanager 


The Installation Instructions also include an explanation of how to install and 
use both DECamds and the Availability Manager on the same system. 


2.1 Configuring and Starting the Data Collector 


Configuration tasks include defining logical names and protecting passwords. 
After you complete these tasks, you can start the Data Collector. The following 
sections describe all of these operations. 


2.1.1 Defining Logical Names 


OpenVMS kits for DECamds Version 7.3-2B and Availability Manager 
Versions 2.5 and 2.6 provide a template file that system managers can 
modify to define the logical names used by the Data Collector. You 

can copy the file SYSS$MANAGER:AMDS$SYSTARTUP.TEMPLATE to 
SYS$MANAGER:AMDS$SYSTARTUP.COM and edit it to change the default 
logicals that are used to start the Data Collector and to find its configuration 
files. 
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The most common logicals, especially in a mixed-environment cluster 
configuration, are the ones shown in Table 2-1: 


Table 2-1 Common Availability Manager Logical Names 


Logical Description 

AMDS$GROUP_NAME Specifies the group that this node will be associated with 
when it is monitored. 

AMDS$DEVICE For nodes with more than one network adapter, allows 
you to specify which adapter the Availability Manager 
should use. 

AMDS$RM_DEFAULT_ The number of seconds between multicast “Hello” 

INTERVAL messages from the Data Provider to the Data Analyzer 
node when the Data Collector is inactive or minimally 
active. 


The minimum value is 5. The maximum value is 300. 


AMDS$RM_SECONDARY_ The number of seconds between multicast “Hello” 
INTERVAL messages from the Data Collector to the Data Analyzer 
node when the Data Collector is active. 


The minimum value is 5. The maximum value is 1800. 


Note 


Multicast “Hello” messages are notifications from OpenVMS nodes that 
use the Data Collector. 


The Data Collector transmits multicast “Hello” messages for the node that any 
Data Analyzer on the local network receives. The rate at which these messages 
are transmitted is regulated by the settings of the following logicals: 


AMDS$RM_DEFAULT_INTERVAL 
AMDS$RM_SECONDARY_INTERVAL 


Files containing these logicals are in SYS$MANAGER:AMDS$LOGICALS.COM. 
The shorter the time interval, the faster the node is found and configured, 
(although the result is slightly more network traffic). 

2.1.2 Protecting Passwords 


To change passwords to allow a Data Analyzer to monitor a node, edit the 
following file: 


SYSSMANAGER : AMDSSDRIVER_ACCESS . DAT 


The passwords section of the file is close to the end of the file, after the Password 
documentation section. The passwords in this file correspond to the passwords 
in the Security page shown in Section 7.9.1. Note that you can specify a list of 
passwords in this file. See the comments in the file for details. 
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2.1.3 Starting the Data Collector 


Starting with OpenVMS Version 7.2, the files needed to run the Data Collector 
on OpenVMS nodes are shipped with the OpenVMS operating system. However, 
if you want the latest Data Collector software, you need to install it from the 
Availability Manager Data Collector kit. Once the Data Collector is running on a 
node, you can monitor that node using DECamds or the Availability Manager. 


For the Data Collector to access requests to collect data and to support the Data 
Analyzer, you must start the Data Collector by entering the START command: 


$ @SYSSSTARTUP:AMDSSSTARTUP START 


To start the Data Collector when the system boots, add the following command to 
the SYS$MANAGER:SYSTARTUP_VMS.COM file: 


S$ @SYSSSTARTUP: AMDSSSTARTUP START 


If you make changes to either the AMDS$DRIVER_ACCESS.DAT or 
AMDS$LOGICALS.COM, you must restart the driver to load the changes. 
Enter the following command: 


$ @SYSSSTARTUP:AMDSSSTARTUP RESTART 


Note 


You can start the Data Collector on all the nodes in a cluster by using the 
following SYSMAN command: 


$ RUN SYSS$SYSTEM: SYSMAN 

SYSMAN> SET ENVIRONMENT /CLUSTER 

SYSMAN> DO @SYSS$STARTUP:AMDSS$STARTUP START 
SYSMAN> EXIT 

$ 


2.2 Starting the Data Analyzer 


This section describes what you need to do after the Availability Manager Data 
Analyzer is installed. Starting the Data Analyzer is somewhat different on 
OpenVMS than on Windows systems. However, on both systems, starting the 
Data Analyzer automatically starts the Java™ graphical user interface (GUD, 
which allows you to view information that is collected from the Data Collector 
nodes. 


The following sections contain the sequence of steps required to start the Data 
Analyzer on an OpenVMS node and a Windows node. 


2.2.1 How to Start the Data Analyzer on an OpenVMS Alpha or I64 Node 


To start a Data Analyzer on an OpenVMS Alpha or I64 node, make sure the Data 
Analyzer has been installed on the node from which you wish to monitor other 
nodes. 


To start the Data Analyzer, enter the following command: 


$ avail/avail 
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The Availability Manager then displays the main System Overview window, 
which is shown in Figure 2-1. 
Note 


For a list of qualifiers you can use with the $ AVAIL/AVAIL command, see 
the HP Availability Manager Installation Instructions. 


2.2.2 How to Start the Data Analyzer on a Windows Node 


Make sure the Data Analyzer has been installed on the Windows node from 
which you want to monitor other nodes. Installation instructions are at the the 
documentation link at the following Web site: 


http: //www.hp.com/products/openvms/availabilitymanager 

Starting the Data Analyzer 

To start the Data Analyzer, follow these steps: 

1. Choose the following options beginning with the Windows Start menu: 
Start —> Programs 

2. Choose Availability Manager. 

3. Choose Data Analyzer Startup. 


The Availability Manager then displays the main System Overview window, 
which is shown in Figure 2-1. 


2.3 Using the System Overview Window 


Once it starts, the Availability Manager watches for multicast “Hello” messages 
from nodes running the Data Collector. It follows these steps: 


1. After receiving a multicast “Hello” message from the Data Collector, the 
Availability Manager attempts to connect to a node. This is called the 
attempting collection state. 


The Availability Manager notifies you of this and other states in the System 
Overview window, which is shown in Figure 2-1. 


2. The Availability Manager performs a security check on each node as it 
establishes contact with it. 


e Ifthe node passes the security check while the Availability Manager is 
attempting the connection, the connection succeeds, and data collection 
starts. This is called the data collection state. 


e Ifthe node fails the security check, that node is in the connection failed 
state. 


3. While the Availability Manager collects data, if a node goes down, or a 
network connection fails between the graphical user interface and the node, 
that node is placed in the path lost state. 


The colors of the icons preceding each node name in Figure 2—1 indicate the state 
of the node. 
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Figure 2-1 System Overview Window 
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The color code of each node state is explained in Table 2-2. 


Table 2-2 Explanation of Color Codes in the System Overview Window 


Color 


Description 


Brown 


Yellow 


Black 


Red 


Green 


Attempts to configure nodes have failed—for example, because the nodes are 
in a connection failed state. A tooltip, which is described in Section 2.3.2.1, 
explains the reason for the failure. 


Nodes are in the attempting collection state; that is, the security check of the 
nodes is in progress. Nodes that remain in this state more than several seconds 
indicate network connectivity problems with the Data Analyzer. 


Nodes are in a path lost state; that is, the network path to the node has been 
lost or the node is not running. 


Nodes are in the data collection state—that is, they are collecting data—but the 
nodes have exceeded a threshold, causing events to be posted. Note that if an 
event causes the output of any message besides an informational one, a node is 
displayed in red. 


Nodes are in the data collection state; that is, the security check was successful, 
and the nodes are collecting data. 


The System Overview window is divided into two segments, or panes: the 
Group/Node pane and the Event pane. 
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2.3.1 Using the Group/Node Pane 


When you start the Availability Manager, the Group/Node pane, at the top of the 

System Overview window (see Figure 2-1), displays OpenVMS groups and nodes 

that the Data Analyzer has found. By default, within each group, the Availability 
Manager displays the nodes with which it can establish a connection. (If the Data 
Analyzer finds Windows nodes, those are also displayed.) 


2.3.1.1 Setting Up Groups 
Groups are set up during installation on Data Collector nodes and are user- 
definable. Be sure to define groups by cluster membership. If a node is not a 
member of a cluster, then you can define a group by function, type of hardware, 
or geographical location. 


If you want to change the groups being monitored, you need to use a 
customization option to make changes. See Section 7.4.1 for instructions. 


Note 


HP recommends that you define a cluster as its own group. This is 
necessary for the Lock Contention, Disk Summary, Disk Volume, and 
Cluster data collections to function correctly. 


2.3.1.2 Displaying Group Information 
Groups—and the nodes in each group with which the Availability Manager is able 
to establish a connection—are displayed in the Group/Node pane of the System 
Overview window (see Figure 2-1). 


To display only groups in the Group/Node pane, click the handle in front of a 
group name to a horizontal position, and the nodes in that group are removed, as 
shown for both groups in Figure 2—2. (Clicking the handle into a vertical position 
displays nodes again.) 


Figure 2-2 Group Overview Pane 
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The numbers in parentheses after “OpenVMS” (in the Group/Node pane of the 
System Overview window) are the following: 


— The first number in parentheses is the total number of groups that are listed. 


— The second number in parentheses is the total number of nodes in all the 
listed groups with which the Availability Manager can establish a connection. 


On each group name row, following the name of the group, the number in 
parentheses is the number of nodes in that group with which the Availability 
Manager has established a connection. 


On a group name row under the OS Version heading are color-coded numbers 
indicating the number of nodes in that group that are one of five color-coded 
states. These states are explained in Table 2-2. 
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Additional summary information about the entire group is on the group line. 
CPU, MEM, BIO, and DIO numbers are averages. The rest of the number are 
totals for all of the nodes in the group. 


Notice the small triangle in the BIO heading in Figure 2-2. The direction of the 
triangle indicates that the nodes are sorted in descending order of BIO rates. 
Click on the triangle to reserve the sort order, or click on another column header 
to select a new item on which to sort data. 


In the Group/Node pane, only nodes within a group are sorted. The groups 
remain in alphabetical order. You can sort groups in the Group Overview window 
by changing the sort order of one of the data column headings (see Figure 2—2). 


2.3.2 Displaying Node Information 


The Group/Node pane of the System Overview window allows you to focus on 
resource usage activity at a high level and to display more specific data whenever 
you want. This section explains the basic use of the Group/Node pane. For more 
information, see Chapter 3. 


2.3.2.1 Displaying Summary Node Information 


Even when nodes are not displayed on the System Overview window or the 
Group/Node pane, you can display important node information by placing the 
cursor over a group name or icon. By holding the cursor over the DECAMDS 
group name, for example, the tooltip similar to the one shown in Figure 2-3 is 
displayed, containing summary node information. 


Figure 2-3 Tooltip Example: Summary Node Information 
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Possible tooltip colors and their meanings are in Table 2-3. 
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Table 2-3 Explanation of Tooltip Colors 


Color Meaning 
Brown Indicates why the configuration of the node failed. 
Yellow Shows number of RM Driver multicast “Hello” messages received and the 


number of attempts to configure the node (“Configuration packets sent”). 
Nodes that remain in this state more than several seconds indicate network 
connectivity problems with the Data Analyzer. 


Black Shows the following: 


For nodes that were in the data collection state (see Table 2-2), and 
communication was then lost: 

— When the connection to the node was lost (“Path lost at time”). 

— When that node was booted (“Boot time: time”). 

— What the uptime of the node was (“Uptime: time”). 


For nodes that were in the connection failed state (see Table 2-2): 
— When the connection to the node was lost (“Path lost at time”). 
— The reason the node was not configured. 


Red Nodes have exceeded a threshold, causing events to be posted for the node. 
If an event causes the output of any message besides an informational one, 
a node is displayed in red. 


Green The security check was successful, and the nodes are collecting data; node 
uptime is shown. 


The Group/Node pane is designed to display monitored nodes in a single pane. 
This format works well for sites that have relatively few nodes to monitor. 
However, for large sites that have many groups and nodes, scrolling through the 
display can be time-consuming. To help those with large sites, two additional 
windows are available: 


e The Group Overview window 


e The Single-Group window 


2.3.2.2 Displaying a Group Overview Window 
The first window to help you view large sites is the Group Overview window. To 
view all the group name row data easily, click on the View menu at the top of the 
page and select “Group Overview.” The Group Overview window that is displayed 
(Figure 2-4) is similar to the Group Overview pane in Figure 2-2. 


Figure 2-4 Group Overview Window 
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This display is designed to provide an overview of all the groups being monitored. 
If you want more information about a group, place the cursor over the group 
name or icon. A tooltip is displayed with additional information about nodes in 
the group similar to the one displayed in Figure 2-3. 


You can also double-click a group name to display a Single-Group window, as 
explained in Section 2.3.2.3. 
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2.3.2.3 Displaying a Single-Group Window 


The second window to help you view large sites is the Single-Group window. This 
display shows the nodes in one group (see Figure 2-5). 


To obtain this display, you can also right-click the group name in the Group/Node 
pane and select the “Display” option. A separate window appears with only the 
nodes in the group you have selected (see Figure 2-5). This window is useful 

in simultaneously displaying groups that are not adjacent in the list in the 
Group/Node pane. 


Figure 2-5 OpenVMS Single-Group Window 
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Within each group of nodes displayed, the Availability Manager displays all 
the nodes with which it can communicate. If some nodes in the group are not 
displayed, it is because the Availability Manager has not received a multicast 
“Hello” message from the Data Collector on that node. 


The display includes the following items: 


e A list of the nodes in the group along with summary data for each node. In 
Figure 2—1, the Debug cluster group contains 9 nodes. 


e A color-coded monitor icon preceding each node name indicates the state of 
the node. See Table 2—2 for explanations of states these colors indicate. 


e For various node data items, some graphs indicate the percentage of an item 
that is being used; other graphs are totals. 


Green graphs indicate percentages below a customized threshold; red graphs 
indicate percentages above a customized threshold. Some data items are 
numbers, not percentages; for example, CPUs, CPU queues, and events. 


More information about node data is in Chapter 3. 


Somewhat different information is displayed for a group of Windows nodes. For 
more information, see Section 3.1.2. 


2.3.2.4 Focusing On a Specific Node 


To display more information about an individual node, double-click a node name 
or in the Single-Group window or the Group/Node pane. You can also right- 
click a node name and select the “Display...” option. The Availability Manager 
displays the Node Summary page shown in Figure 2-6. (The data on this page is 
explained in more detail in Chapter 3.) 


Getting Started 2-9 


Getting Started 
2.3 Using the System Overview Window 


Figure 2-6 OpenVMS Node Summary 
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OpenVMS Alpha node PRFE45 - Node Summary 


At the top of the Node Summary page are tabs that correspond to types of node 
data displayed in the Group/Node pane. If you double-click a field under a 
column heading in the Group/Node pane, the Availability Manager displays a 
page that provides more information about that field. For example, if you click a 
value under “CPU”, the Availability Manager displays a page similar to the one 
shown in Figure 3-6. 


2.3.2.5 Specifying Data to Be Collected 
By default, the only data collected for a node is the data displayed in the Node 
pane (Figure 2-5). This data is called a node summary data collection. The 
events in the Event pane of the System Overview window (see Figure 2—1) are 
produced when node summary data is processed. See Appendix C for a list of 
events associated with node summary data. 


If you want to signal additional events that are listed in Appendix C, you must 
collect the data associated with those events. To collect this data by default, you 
must enable background data collection for the data. Background and foreground 
data collections are explained in more detail in Section 1.4.1.2. 


For OpenVMS nodes, if you want background data collection (and the associated 
event detection), you must turn on data collection for each type of data you want 
to collect. On Windows nodes, background data collection is always enabled and 
cannot be turned off. 


To turn on various types of data to be collected, follow these steps: 
1. In the System Overview window (Figure 2-1), click the Customize menu. 


2. Click Customize OpenVMGS.... 
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3. Click the Data Collection tab. 


The Availability Manager then displays the Data Collection Customization page 
(Figure 2—7). 


Figure 2-7 Data Collection Customization 
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The following types of data are collected by default: 


e Node summary 

e Single disk 

e Single process 

To turn on a type of data collection, select the checkbox for that type of data 
collection in the “Collect” column. For example, to collect CPU process data, check 


the checkbox for “CPU process” in the Collect column. Clicking the checkbox 
again clears it. 
When you click a data collection name, the Explanation section at the bottom 


of the page tells where the data for a particular data collection is displayed. 
Table 7-3 summarizes this information. 


You cannot turn off the collection of single disk and single process data. These 
types of data are collected by default when you open a Single Disk Summary page 
or a Process Information page, respectively. 


On the Data Collection Customization page, you can change the intervals at 
which data is collected. Collection intervals are explained in Chapter 7. 
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2.3.2.6 Sorting Data 


You can sort data in many OpenVMS displays. The following list provides some 
examples. To sort the values in a field, click the corresponding column heading. 
To reverse the sort order, click the column heading again. 


e Event pane of the System Overview window (Figure 2-1) 
e CPU Process Summary pane (Figure 3-8) 

e Memory page (Figure 3-10) 

e Bottom pane of I/O Summary page (Figure 3-12) 

e Disk Status Summary page (Figure 3-14) 

e Disk Volume Summary page (Figure 3-16) 


Depending on the field, you can sort data alphabetically or numerically. An 
alphabetical sort is performed using ASCII character values; for example, dollar 
signs ($) precede letters in the sort order. 


2.3.3 Using the Event Pane 


The event pane occupies the bottom part of the System Overview window 
(Figure 2-1). In this pane, the Availability Manager displays events that occur on 
all the nodes being monitored on your system, including nodes that might not be 
displayed currently in the Group/Node pane. 


Events signal potential problems that might require further investigation. An 
event must reach a certain level of severity to be displayed. You can customize 
the severity levels at which events are displayed (see Chapter 7). For more 
information about displaying events, see Chapter 5. 


The events that are signalled depend on the types of data collection that are 
performed (see Section 2.3.2.5). 


In the System Overview window, you can change the size of the panes as well as 
the width of specific fields. You can also change the borders between the fields by 
placing the mouse on the border, displaying a double-headed arrow, and dragging 
the border to the right or left. 


Scroll bars indicate whether you are displaying all or part of a pane. For example, 
clicking a right arrow on a scroll bar allows you to view the rightmost portion of 
a screen. 


2.3.4 Other System Overview Window Components 


In addition to panes, the System Overview window (Figure 2—1) also includes 
features such as a title bar, menu bar, and status bar: 


Title bar 
The title bar runs across the top of the window and contains the product name 
and version. 


Menu bar 
The menu bar, immediately below the title bar, contains the following menu 
options: 


e 6File 


The File menu contains the Exit option, which allows you to stop the Data 
Analyzer and close the window. 


e Customize 
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The Customize menu contains options that allow you to customize various 
aspects of the Availability Manager. These options are explained in Chapter 7. 


Help 


The Help menu offers different types of online help for the Availability 
Manager. These options are explained in Section 2.4. 


Status bar 


The status bar, which runs across the bottom of the window, displays the name of 
the selected group and the number of nodes in that group. 


2.4 Getting Help 


To obtain online help, click on the Help menu on the System Overview window 
menu bar. Then choose one of the following options, which are displayed at the 
top of the page. 


Menu Option Description 

Availability Manager Information about using the Availability Manager. 

Help 

Getting Started A special online version of help for getting started using this 
tool. 

Availability Manager Last-minute information about the software and how it works. 

Release Notes 

About Availability Information about this Availability Manager release (such as 

Manager... the copyright date). 


2.5 Printing a Display 


The Availability Manager does not provide a printscreen capability. However, 
you can capture Availability Manager displays and print them by following these 


steps: 
1. Click on the selected Availability Manager display to make it your active 
window. 
2. Press the key combination Alt + PrintScreen. 
This action copies the image of the display into your copy buffer. (To capture 
the entire screen, press Ctrl + PrintScreen.) 
3. Run the Windows Paint program: 
Start --> Programs --> Accessories --> Paint 
4. Do one of the following: 
e Press the key combination Ctrl + V. 
e From Paint’s Edit menu, select Paste. 
5. Then do one of the following: 


e Select an option from Paint’s File menu. For example: 


— Save or Save As...: to name the file containing the display image and 
place it in a directory that you specify. 


— Print: to print the display image on a printer that you select. 
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e Use one of Paint’s editing options to edit the display image before saving 
or printing it. 
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Note 


Before you start this chapter, be sure to read the explanation of data 
collection, events, thresholds, and occurrences as well as background and 
foreground data collection in Chapter 1. HP also recommends completing 
the getting-started steps described in Chapter 2. 


Node summary data is the only data that is collected by default. The Availability 
Manager looks for events only in data that is being collected. 


You can collect additional data in either of the following ways: 


e Open any display page that contains node-specific data (for example, CPU, 
memory, I/O) automatically starts foreground data collection and event 
analysis except for Lock Contention and Cluster Summary information. 
(You must select these tabs individually to start foreground data collection.) 
Collection and evaluation continue as long as a page with node-specific data 
is displayed. 


e Click a check mark on the Data Collection Customization page (which you 
can select on the Customize OpenVMS... menu) enables background collection 
of that type of data. Data is collected and events are analyzed continuously 
until you remove the check mark. 


For additional information about how to change these settings, see Chapter 7. 


This chapter describes the node data that the Availability Manager displays by 
default and more detailed data that you can choose to display. Differences are 
noted whenever information displayed for OpenVMS nodes differs from that 
displayed for Windows nodes. 


Although Cluster Summary is one of the tabs displayed on the OpenVMS Node 
Summary page (Figure 3—4), see Chapter 4 for a detailed discussion of OpenVMS 
Cluster data. 


Note 


On many node displays, you can hold the cursor over a data field or 
column header to display an explanation of that field or header in a small 
rectangle, called a tooltip. Figure 3-2 contains an example. 
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3.1 Group/Node Pane 


The Availability Manager automatically displays data for each node within 
the groups displayed in the Group/Node pane of the Application window 
(Figure 3-1). 


Figure 3-1 OpenVMS Group/Node Pane 
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Recall that the colors of the icons represent the following states: 


Color Description 

Brown Attempts to configure the node have failed—for example, because the 
nodes are in a connection failed state. 

Yellow Node security check is in progress. 

Black Network path to node has been lost, or the node is not running. 

Red Security check was successful. However, a threshold has been 


exceeded, and an event has been posted. 


Green Security check was successful; data is being collected. 


If you hold the cursor over a node name, the Availability Manager displays 

a tooltip explaining the specific reason for the color that precedes the node 
name. By holding the cursor over many column headers and some data items on 
Availability Manager screens, you can display tooltips. Figure 3—2 is an example 
of a tooltip that explains the BIO column header in the Group/Node pane. 


Figure 3-2 Sample Tooltip 
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The colors and their meanings are in Table 3-1. 


Table 3-1 Explanation of Tooltip Colors in the Group/Node Pane 


Color 


Meaning 


Brown 
Yellow 


Black 


Red 


Green 


Indicates why the configuration of the node failed. 


Shows number of RM Driver multicast “Hello” messages and the number of 
attempts to configure the node (“Configuration packets sent”). Nodes that 
remain in this state more than a few seconds indicate network connectivity 
problems with the Data Analyzer. 


Shows one of the following: 


If the node was successfully configured and then lost, 

— When the connection to the node was lost (“Path lost at time”). 
— When that node was booted (“Boot time: time”). 

— What the uptime of the node was (“Uptime: time”). 


If the node was never configured, 
— When the connection to the node was lost (“Path lost at time”). 
— The reason the node was not configured. 


If an event causes the output of any message besides an informational one, 
a node is displayed in red. 


Nodes are in the data collection state. 


The following sections describe the data displayed for OpenVMS and Windows 
Group/Node panes. 


3.1.1 OpenVMS Node Data 


Node data with a graph displayed in red indicates that the amount is above the 
threshold set for the field. For each OpenVMS node and group it recognizes, 
the Availability Manager displays the data described in Table 3-2. This table 
also lists the abbreviation of the event that is related to each type of data, 
where applicable. See Section 7.8 for information about setting event thresholds. 
Appendix B describes OpenVMS and Windows events. 


Note that you can sort the order in which data is displayed in the Node Pane by 
clicking a column header. To reverse the sort order of a column of data, click the 
column header again. 


Table 3-2 OpenVMS Node Data 


Data Description of Data Related Event 
Node Name Name of the node being monitored. n/a 
CPU! Percentage of CPU usage of all processes on the HICOMQ 
node. HIMTTO 
PRCCUR 
PRCPUL 


Active CPUs 


The number of active CPUs over the number of n/a 
CPUs in the potential set. The potential set is 

the maximum number of CPUs available to the 

node. 


1By default, the CPU heading follows Node Name on a line of Node pane data. You can use the cursor 
to move a column heading to another location on the line, if you like. 


(continued on next page) 
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Table 3-2 (Cont.) OpenVMS Node Data 


Data Description of Data Related Event 
MEM Percentage of space in memory that all processes LOMEMY 
on the node use. 
BIO Buffered I/O rate of processes on the node. HIBIOR 
DIO Direct I/O usage of processes on the node. HIDIOR 
CPU Qs Number of processes in one of the following HIMWTQ 
states: MWAIT, COLPG, PFW, FPG. PRCMWT 
HIPWTQ 
PRCPUT 
Events Number of triggered events that are associated List of relevant 
with this node. events 
Proc Ct Actual count of processes over the maximum HIPRCT 
number of processes. Percentage of actual to 
maximum processes. 
OS Version Version of the operating system on the node. NOPLIB 
UNSUPP 
HW Model Hardware model of the node. NOPLIB 
UNSUPP 
HW Arch Hardware architecture: Alpha or VAX n/a 


3.1.2 Windows Node Pane 


Figure 3-3 is an example of a Windows Node pane. From the group you select, 
the Availability Manager displays all the nodes with which it can communicate. 


Figure 3-3 Windows Node Pane 
Node Name | CPU] MEM Dio} Processes) Threads) Events| Semaphores| Mutexes| Sections! OS Version HW Model 
~» PYROMAN 1 13 97 9 


1 | 61 125 | 273 129 Windows NT 4.0 DEC-321064 
Ssreua 1 50/0 20 168 | 354 96 19 | 213 Windows NT 4.0 DEC-321064 
Susi 1 8 0. 97 152 | 464 68 | 19 | 203 Windows NT 4.0 DEC-321064 | 


For each Windows node in the group, the Availability Manager displays the data 
described in Table 3-3. 


Table 3-3 Windows Node Data 


Data Description 

Node Name Name of the node being monitored. 

CPU Percentage of CPU usage of all the processes on the node. 

MEM Percentage of memory that is in use. 

DIO Direct I/O usage of processes on the node. 

Processes Number of processes on the node. 

Threads Number of threads on the node. A thread is a basic executable 
entity that can execute instructions in a processor. 

Events The number of events on the node. An event is used when two or 


more threads want to synchronize execution. 


(continued on next page) 
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Table 3-3 (Cont.) Windows Node Data 


Data Description 

Semaphores The number of semaphores on the node. Threads use semaphores 
to control access to data structures that they share with other 
threads. 

Mutexes The number of mutexes on the node. Threads use mutexes to 
ensure that only one thread executes a section of code at a time. 

Sections The number of sections on the node. A section is a portion of 
virtual memory created by a process for storing data. A process 
can share sections with other processes. 

OS Version Version of the operating system on the node. 

HW Model Hardware model of the node. 


3.2 Node Data Pages 


The following sections describe node data pages, which you can display in any of 
the following ways: 


e Double-click a data item in the Group/Node or Node pane to display an 
associated page. 


e Double-click a node name on the Group/Node or Node pane to display a 
Node Summary page (Figure 3-4). You can then click other tabs on the 
Node Summary page to display the same detailed data that you display by 


double-clicking a data item in the Group/Node or Node pane. 


e Double-click an event in the Event pane. 


The menu bar on each node data page contains the options described in 


Table 3-4. 


Table 3-4 Node Data Page Menu Bar 


For More 
Menu Option Description Information 
File Contains the Close option, which you can choose n/a 
to exit from the pages. 
View Contains options that allow you to view data See specific pages. 
from another perspective. 
Fix Contains options that allow you to resolve various Chapter 6 
resource availability problems and improve 
system performance. 
Customize Contains options that allow you to organize data Chapter 7 


collection and analysis and to display data by 
filtering and customizing Availability Manager 
data. 


The following sections describe individual node data pages. 


Getting Information About Nodes 3-5 


Getting Information About Nodes 
3.2 Node Data Pages 


3.2.1 Node Summary 


When you double-click a node name, operating system (OS) version, or hardware 
model in an OpenVMS Group/Node pane (Figure 2-1) or a Windows Node 

pane (Figure 3-3), the Availability Manager displays the Node Summary page 
(Figure 3-4). 


Figure 3-4 Node Summary 
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Summary Information 


Model: AlphaServer ES45 Model 2 
OS Version: OpenVMS V8.2 

Uptime: 4 01:53:28 .54 

Memory: 8.00 GB 

Active CPUs: 4 


Configured CPUs: 4 
CPU Architecture: Alpha 


Max RADs: 1 
Serial Number: 00 00 00 00 32 33 30 41 5a 50 53 4a 32 31 32 34 
Galaxy ID: 35 34 45 46 52 50 01 Of 11 Of Of Ba 7d 65 Of 00 


OpenVMS Alpha node PRFE45 - Node Summary 


On this page, the following information is displayed for the selected node: 


Data Description 

Model System hardware model name. 

OS Version Name and version of the operating system. 

Uptime Time (in days, hours, minutes, and seconds) since the last reboot. 

Memory Total amount of physical memory (in MBs or GBs) found on the system. 

Active CPUs Number of CPUs running on the node. 

Configured Number of CPUs that are configured to run on the node. 

CPUs 

Max RADs Maximum number of resource affinity domains (RADs) for this node. 

Serial Number The system’s hardware serial number retrieved from the Hardware 
Restart Parameter Block (HWRPB). 

Galaxy ID The Galaxy ID uniquely identifies a Galaxy. Instances in the same 


Galaxy have the same Galaxy ID. 
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3.2.2 CPU Modes and Process Summaries 


By clicking the CPU tab, you can display CPU panes that contain more detailed 
statistics about CPU mode usage and process summaries than the Node Summary 
does. You can use the CPU panes to diagnose issues that CPU-intensive users 

or CPU bottlenecks might cause. For OpenVMS nodes, you can also display 
information about specific CPU processes. 


When you double-click a value under the CPU or CPU Qs heading on either an 
OpenVMS Group/Node or a Windows Node pane, or when you click the CPU 
tab, the Availability Manager displays the CPU Mode Summary in the top pane 
(Figure 3-6 and, by default, CPU Mode Details (Figure 3—7) in the lower pane. 
You can use the View menu to select the CPU Process Summary in the lower 
pane (Section 3.2.2.4). 


CPU mode summaries and process summary panes are described in the following 
sections. Note that there are differences between the pages displayed for 
OpenVMS and Windows nodes. 

3.2.2.1. Windows CPU Modes 


Figure 3-5 provides an example of a Windows CPU Modes page. The sample 
page contains values for the three CPU modes—user, privileged, and null. 


Figure 3-5 Windows CPU Modes 


alo}x) 


File ix Customize 


(Node Summary |CPU [Memory [Disk | 


CPU Modes 
Current Extreme 
User 30.20 31.98 
Privileged 20.35 21.77 
Null 49.46 48.00 
DPCs Queued/sec 18.63 44,92 
Interrupts/sec 146.82 173.68 
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ELLE D 


Windows NT Intel node AFFC36 - CPU 


The top pane of the Windows CPU Modes page is a summary of Windows CPU 
usage, listed by type of mode. 


On the left, the following CPU modes are listed: 
e User 

e Privileged 

e Null 


On the graph, values that exceed thresholds are displayed in red. To the right of 
the graph are current and extreme amounts for each mode. 
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Current and extreme amounts are also displayed for the following values: 
e Deferred procedure calls (DPCs) queued per second 
e Interrupts that occurred per second 


The lower pane of the Windows CPU Modes contains modes details. The following 
data is displayed: 


Data Description 

CPU ID Decimal value representing the identity of a processor in a 
multiprocessing system. On a uniprocessor, this value is always 
CPU #00. 

Mode % Graphical representation of the percentage of active modes on that 
CPU. The color displayed matches the mode color on the graph on the 
top pane. 

DPCs Queued Rate that deferred procedure call (DPC) objects are queued to this 
processor’s DPC queue. 

DPC Rate Average rate that DPC objects are queued to this processor’s DPC 
queue per clock tick. 

DPC Bypasses Rate that dispatch interrupts were short-circuited. 

APC Bypasses Rate that kernel asynchronous procedure call (APC) interrupts were 


short-circuited. 


3.2.2.2 OpenVMS CPU Mode Summary and Process States 


Figure 3-6 shows sample OpenVMS CPU Mode Summary and CPU Process 
States, which are the left and right top panes of the CPU Modes page. 


Figure 3-6 OpenVMS CPU Mode Summary and Process States 
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Interrupt 20.82 49.41 PR FPG RWCSV IMODE 
Compatibility 0.00 0.00 COLPG BYvAIT EXH 
MP synch 0.60 3.86 JSWVAIT PSXFR 
52.70 0.02 TWaiT 
MWAIT 1 
Qo 10 
1 4 Total 145 


CPU Mode Summary 
In the CPU Mode Summary section of the pane, percentages are averaged across 


all the CPUs and are displayed as a single value on symmetric multiprocessing 
(SMP) nodes. 


To the left of the graph is a list of CPU modes. The bars in the graph represent 
the percentage of CPU cycles used for each mode. To the right of the graph are 
current and extreme percentages of time spent in each mode. 
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Below the graph, the Availability Manager displays the COM and WAIT process 
queues: 


e COM: The value displayed is the number of processes in the COM and COMO 
states. 


e WAIT: The value displayed is the number of processes in the miscellaneous 
WAIT, MWAIT, COLPG, CEF, PFW, and FPG states. 


CPU Process States 
The right side of Figure 3-6 shows a sample CPU Process States display. Note 


that the value for MWAIT, in the left column, is the sum of all values for the 
states in the two right columns. 


This display shows the number of processes in each process state. This number 
is tallied from the data in CPU Process view of the CPU page (Figure 3-6). For 
systems with many processes, the data in the CPU Process view is collected 

in segments over a short period of time because the amount of data a network 
packet can contain is limited. Because of this, the number of processes in the 
Process States pane might differ slightly from what is reported in $MONITOR 
STATES. 


Appendix A contains explanations of the CPU process states. 


3.2.2.3. OpenVMS CPU Mode Details 


The lower pane of the CPU Modes page contains CPU mode details, as shown in 
Figure 3-7. 
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Figure 3-7 OpenVMS CPU Mode Details Pane 


cpu #000 | Run 
cpu #001 Run 
cpu #002 | Run 
cpu #003 | Run 
cpu #004 | Run 
cpu #005 | Run 
cpu #006 — [Run 


cpu #007 Run 


cpu #008 Run 
cpu #009 | Run 


cpu #010 | Run 


cpu #011 (Run | 
CPU #012 | Run 


cPu #013 | [Run 
cpu #014 [Run 


cpu #015 /Run 
cpu #016 Run 


cpu #017 [Run 
cpv #018 Run 
cpu #019 | Run 


cpu #020 [Run 
cPu #021 Run 


cpu #024 | Run 
cpu #025 | Run 


cpu #026 Run 
cpu #027 | Run 
cpu #028 | Run 


cPU #029 | Run 


CPU #030 | Run 


cPu #031 Run 


Pui | aie | ___Mole PID Process Name Capabilities RAD 
[Fe None *** PRIMARY RUN QUORUM 0 
*** None *** RUN QUORUM 0 
*** None *** RUN QUORUM Oo 
*** None *** RUN QUORUM 0 
+#* None *## RUN QUORUM 0 
|*** None *** RUN QUORUM 0 
*** None *** RUN QUORUM 0 
Ff Hone ¥t RUN QUORUM 0 
*** None *** RUN QUORUM 0 
|*** None *** RUN QUORUM 0 
*** None *** RUN QUORUM 0 
+#* None *## RUN QUORUM 0 
|*** None *** RUN QUORUM 0 
[*** None *** RUN QUORUM Q 
Hone *** RUN QUORUM 0 
eef fons **% RUN QUORUM 0 
*** None *** RUN QUORUM Oo 
**#* None *** RUN QUORUM 0 
[eqs OOOFO10C | RUN QUORUM 0 
| CTM$ oooFro099 RUN QUORUM 0 
cT™$ OOOFOO4A RUN QUORUM 0 
ee* Hone *** RUN QUORUM 0 
cT™$ OOOFOOBF RUN QUORUM 0 
cem$ O00Fr0025 RUN QUORUM 0 
cTm$ 00040019 RUN QUORUM 0 
[erms 00080015 | RUN QUORUM 0 
|erm$ 000c0008 RUN QUORUM 0 
cT™$ OOO09000B RUN QUORUM 0 
LCTM$ oooB0007 RUN QUORUM 0 
ef? Hons 22 RUN QUORUM 0 


ipenVMS Alpha node QTV18 - CPU Summary - physical modes view: 30 physical cpus (30 listed, 0 filtered out) 


In the OpenVMS CPU Mode Details pane, the following data is displayed: 


Data Description 

CPU ID Decimal value representing the identity of a processor in a 
multiprocessing system. On a uniprocessor, this value is always 
CPU #00. 

State One of the following CPU states: Boot, Booted, Init, Rejected, 
Reserved, Run, Stopped, Stopping, or Timeout. 

Mode % Graphical representation of the percentage of active modes on that 
CPU. The color displayed coincides with the mode color in the 
graph in the top pane. 

PID Process identifier (PID) value of the process that is using the 


Process Name 


Capabilities 


CPU. If the PID is unknown to the Data Analyzer application, the 
internal PID (IPID) is listed. 


Name of the process active on the CPU. If no active process is 
found on the CPU, the name is listed as *** None ***. 


One or more of the following CPU capabilities or flags: 
e Capabilities: Primary, Quorum, Run, or Vector. 


e Flags: Idle, Lckmgr, Fastpath_CPU, Fastpath_Ports, Low_ 
power, and Cothread_of_nn. 


Number of the RAD where the CPU exists. 
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3.2.2.4 OpenVMS CPU Process Summary 


To display the OpenVMS CPU Process Summary pane at the bottom of the CPU 
page, select CPU Process Summary from the View menu (Figure 3-6). Figure 3-8 
shows a sample OpenVMS CPU Process Summary pane. 


Figure 3-8 OpenVMS CPU Process Summary Pane 


216005FF FRED1 10 1 6/ 4) IB) 40.06 0.00; 0 00:02:59.83 0 
21600600 FRED1 11.1 6/ 4) INNER_MODE| 21.89 0.00) 0 00:03:34.13 o 
21600601 FRED1_ 12 1 4/ 4) com 9.65) 90.00) 0 00:01:59.95 if] 
21600602 FRED1 13 1 4/4 com 0.00) 99.99) 0 00:02:02.21 ia] 
21600603 FRED1 14 1 4/ 4) INNER MODE| 23.18 0.09 O 00:02:17.69 0 
21600604 FRED1_ 15 1 6/ 4 HIB 4.38 0.00 O 00:01:55.56 0 
21600605 4/ 4) com 0.00) 99.99) 0 00:02:13.87 it] 


The OpenVMS CPU Process Summary pane displays the following data: 


Data Description 

PID Process identifier, a 32-bit value that uniquely identifies a process. 

Process Name Name of the process active on the CPU. 

Priority Computable (xx) and base (yy) process priority in the format xx/yy. 

State One of the process states listed in Appendix A. 

Rate Percentage of CPU time used by this process. This is the ratio of 
CPU time to elapsed time. The CPU rate is also displayed in the 
bar graph. 

Wait Percentage of time the process is in the COM or COMO state. 

Time Amount of actual CPU time charged to the process. 

Home RAD Where most of the resources of the process reside. 


Displaying Single Process Information 

When you double-click a PID on the lower part of an OpenVMS CPU Process 
Summary (Figure 3-8), Memory Summary (Figure 3-10), or I/O Summary 
(Figure 3-12) page, the Availability Manager displays the first of several 
OpenVMS Single Process pages. 


On these pages, you can click tabs to display specific data about one process. 
Alternatively, you can display all of the information on the pages on a single 
vertical or horizontal grid page. 


This data includes a combination of data elements from the CPU Process, 
Memory, and I/O pages, as well as data for specific quota utilization, current 
image, and queue wait time. These pages are described in more detail in 
Section 3.3. 


3.2.3 Memory Summaries and Details 


The Memory Summary and Memory Details pages contain statistics about 
memory usage on the node you select. The Memory Summary pages displayed 
for OpenVMS and Windows nodes are somewhat different, as described in the 
following sections. The Memory Details page exists only for OpenVMS systems. 
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3.2.3.1. Windows Memory Summary 


To display the Windows Memory Summary page, you can use either of the 
following methods: 


e Double-click a node, and then click the Memory tab (Figure 3-3). 


e Double-click a value under the MEM heading (Figure 3-38). 


The Availability Manager displays the Windows Memory page (Figure 3-9). 


Figure 3-9 Windows Memory 


A alolx) 


File e 7 Customize 


(Node Summary {CPU | Memory | Disk | 


;Memory (127 42 Megabytes) 


Available 
Cache 

Paged Pool 
Nonpaged Pool 


Committed Bytes 


Commit Limit 


Current Extreme 
93.91 MB 93.85 MB 
10.66 MB 10.66 MB 

7.53 MB 7.52 MB 
2.41 MB 2.40 MB 


26.16 MB 26.17 MB 


243.13 MB 


Windows NT Intel node AFFC36 - Memory 


The Current and Extreme amounts on the page display the data shown in the 
following table. The table also indicates what the graph amounts represent. 


Data 


Description 


Available 


Cache 


Paged Pool 


Nonpaged Pool 


Size (in bytes) of the virtual memory currently on the zeroed, free, and 
standby lists. Zeroed and free memory are ready for use, with zeroed 
memory cleared to zeros. Standby memory is removed from a process’s 
working set but is still available. The graph shows the percentage of 
physical memory that is available for use. 


Number of bytes currently in use by the system cache. The system 
cache is used to buffer data retrieved from disk or LAN. The system 
cache uses memory not in use by active processes on the computer. The 
graph shows the percentage of physical memory devoted to the cache. 


Number of bytes in paged pool, a system memory area where operating 
system components acquire space as they complete their tasks. Paged 
pool pages can be paged out to the paging file when the system does not 
access them for long periods of time. The graph shows the percentage 
of physical memory devoted to paged pool. 


Number of bytes in nonpaged pool, a system memory area where 
operating system components acquire space as they complete their 
tasks. Nonpaged pool pages cannot be paged out to the paging file; 
instead, they remain in memory as long as they are allocated. The 
graph shows the percentage of physical memory devoted to nonpaged 
pool. 


3-12 Getting Information About Nodes 


Getting Information About Nodes 
3.2 Node Data Pages 


Data Description 
Committed Amount of available virtual memory (the Commit Limit) that is in use. 
Bytes Note that the commit limit can change if the paging file is extended. 


The graph shows the percentage of the Commit Limit used by the 
Committed Bytes. 


Commit Limit Size (in bytes) of virtual memory that can be committed without having 
to extend the paging files. If the paging files can be extended, this limit 
can be raised. 


3.2.3.2 OpenVMS Memory Summary 


When you double-click a value under the MEM heading in an OpenVMS Node 
pane, or if you click the Memory tab, the Availability Manager displays the 
OpenVMS Memory Summary page (Figure 3-10). 


Alternatively, if you click the View menu on the OpenVMS Memory Summary 
page, the following options are displayed in a shortcut menu: 


e Memory Summary View 
e Memory Details View 


You can click Memory Summary View to select the Memory Summary page. 


Figure 3-10 OpenVMS Memory Summary 


alolx) 


File View Fix Customize Help 


(Node Summary {CPU |(Memory [1/0 Disk {Lock Contention {Cluster Summary | 


Memory (1024 Megabytes) 


Current Extreme 


Free 862 MB 860 MB 
Used 150 MB 151 MB 
Modified 12 MB 12 MB 

Bad Pages oO 


PID Process Name Count | _ Size Extent Rate iis} 
20200088 | PWRKSLMSRY 28720 39904) 600000 Q.00 Q.00 
20200258 | WBEM$SERVER [14992 20512| 600000 o.00 o.00 
202000BE | DECW$SERVER_O 14784 20512| 600000 Q.00 o.00 
20200093 | DNS$ADVER | 12144 e112) 600000 o.00 Q.00 
202000BB | PWRKSLMBROWSER | 10448 16000; 600000) 0.00 | Q.00 
2020000 | DTGREET | 8256 13312) 600000 Q.00 Q.00 
20200082 | PWRKSLMMCP 7488) 13216) 600000) o.00 o.00 
2020026D |WBEMS$DCLSHOW | 7232) 10912) 600000) 0.00) 0.00 
202000AD | PWRKSLICENSE R | 6976) 11200 600000 o.00 Q.00 
2020008¢ | DCESRPCD | 6672 10912) 600000 Q.00 0.00 
202002D3 | WBEM$CPQHOST | 5488 8512| 600000 o.00 Q.00 
202000BF | DTLOGIN | 5328 8512) 600000 Q.00 Q.00 
20200092 |SECURITY SERVER | 5248 8512) 600000) Q.00 Q.00 
20200258 | WBEM$CPONIC | 5136 8512| 600000 o.00 0.00 | | 


OpenVMS Alpha node AFFS52 - Memory Usage - process view: 59 processes (58 listed, 1 filtered out) 


The graph in the top pane of Figure 3-10 shows memory distribution (Free, Used, 
and Modified) as absolute values, in megabytes of memory. Current and extreme 
values are also listed for each type of memory distribution. (Free memory uses 
the lowest seen value as its extreme.) Bad Pages show the number of pages that 
the operating system has marked as bad. 
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The thresholds that you see in the graph are the ones set for the LOMEMY event. 
(The LOMEMY thresholds are also in the display of values for the MEM field in 
the OpenVMS Group/Node pane shown in Figure 2-1.) 


The lower pane in Figure 3-10 displays the data shown in the following table, 
including an abbreviation of the event that is related to each type of data, where 


applicable. 
Data Description Related Events 
PID Process identifier. A 32-bit value that uniquely n/a 

identifies a process. 
Process Name Name of the process. NOPROC, 

PRCFND 

Count Number of physical pages or pagelets of memory LOWEXT 

that the process is using for the working set 

count. 
Size Number of pages or pagelets of memory the LOWSQU 


process is allowed to use for the working set size 
(also known as the working set list size). The 
operating system periodically adjusts this value 
based on an analysis of page faults relative to 
CPU time used. 


Extent Number of pages or pagelets of memory in the LOWEXT 
process’s working set extent (WSEXTENT) quota 
as defined in the user authorization file (UAF). 
Number of pages or pagelets cannot exceed the 
value of the system parameter WSMAX. 


Rate Number of page faults per second for the process. LOWSQU, 
LOWEXT, 
PRPGFL 

VO Rate of I/O read attempts necessary to satisfy PRPIOR 


page faults (also known as page read J/O or the 
hard fault rate). 


When you double-click a PID on the lower part of the Memory Summary page 
(Figure 3-10), the Availability Manager displays an OpenVMS Single Process 
(Figure 3-23), where you can click tabs to display pages containing specific 
data about one process. This data includes a combination of data from the CPU 
Process, Memory, and I/O pages, as well as data for specific quota utilization, 
current image, and queue wait time. These pages are described in Section 3.3. 


3.2.3.3 OpenVMS Memory Details 


When you click the View menu on the OpenVMS Memory Summary page 
(Figure 3-10), the following options are displayed in a shortcut menu. To display 
memory details, select that option. 


e Memory Summary View 
e Memory Details View (Alpha only) 


The Availability Manager displays the OpenVMS Memory Details page 
(Figure 3-11). 
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(Node Summary {CPU |Memory {/0 /Disk {Lock Contention [Cluster Summary 


3.2 Node Data Pages 


lolx) 


Successful Expansions: © Total memory ® Galactic shared used Single RAD items 
0 ® Available process memory @ Galactic shared unused @ Free list 
Feld eT @ Free list © Global read-only @ Modified list 
system space replication: ® Modified list ® Total non-paged pool @ Non-paged pool 
desbiod (® Resident code region @ Total free non-paged pool _®@) Free non-paged pool 
@ Reserved page count 


aimee | ian 


Memory measured in 8192 byte pages 


82 819 8192 81920 Current Extreme 
Total memory 81920Pas 81920Pgs 
Available process memory’ 74170Pqs 74170Pgs 
Free list, 70152Pgs 70152Pgs 
Modified list} 982Pas 982Pgs 
Resident code region 1024Pgs 1024Pags 
Reserved page count} OPgs OPgs 
Galactic shared used OPgs OPas 
Galactic shared unused OPgs OPgs 
Global read-only 285Pas 285Pas 
Total non-paged pool 1586P¢s 1586Pgs 
Total free non-paged pool 1204Pas 1204Pgs 

RADO 

Free list, 70152Pgs 70152Pgs 
Modified list 982Pas 982Pgs 
Non-paged pool 1586Pgs 1586 Pgs 
Free non-paged pool 1204Pas 1204Pgs 


OpenVMS Alpha node 2BOYS - Memory Usage and RAD breakdown. 


The following data items are in a box at the top left of the page: 


Heading Description 


Successful Expansions Number of successful nonpaged pool expansions. 


Failed Expansions Number of failed attempts to expand nonpaged pool. 


System space replication Whether system space replication is enabled or disabled. 


To the right of the box is a list of system memory data that is displayed in the 
bar graphs at the bottom of the page. You can toggle these data items on or off 
(that is, to display them as bar graphs). You can also click a small box to choose 
between Linear and Logarithmic bar graph displays. 


The system memory data items are described in Table 3-5. 
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Table 3-5 System Memory Data 


Data Description 

Total memory Total physical memory size, as seen by OpenVMS. 

Available process memory Amount of total physical memory available to processes. 
This is the total memory minus memory allocated to 
OpenVMS. 

Free list Size of the free page list. 

Modified list Size of the modified page list. 

Resident code region Size of the resident image code region. 

Reserved page count Number of reserved memory pages. 

Galactic shared used Galaxy shared memory pages currently in use. 

Galactic shared unused Galaxy shared memory pages currently not in use. 

Global read-only Read-only pages, which are installed as resident when 


system space replication is enabled, that will also be 
replicated for improved performance. 


Total nonpaged pool Total size of system nonpaged pool. 
Total free nonpaged pool Amount of nonpaged pool that is currently free. 


To the right of the system memory data is a list of single RAD data items, which 
are described in Section 3.3.7. You can toggle these items to display them in bar 


graphs. 


Table 3-6 Single RAD Data Items 


Data Description 

Free list Size of the free page list. 

Modified list Size of the modified page list. 

Nonpaged pool Total size of system nonpaged pool. 

Free nonpaged pool Amount of nonpaged pool that is currently free. 


Below the list of single RAD items is a box where you can toggle between 
Percentage and Raw Data to display Current and Extreme values to the right 
of the bar graphs. 


3.2.4 OpenVMS I/O Summary and Page/Swap Files 


By clicking the I/O tab on any OpenVMS node data page, you can display a page 
that contains summaries of accumulated I/O rates. In the top pane, the summary 
covers all processes; in the lower pane, the summary is for one process. 


From the View menu, you can also choose to display (in the lower pane) a list of 
page and swap files. 


3.2.4.1 OpenVMS I/O Summary 


The OpenVMS I/O Summary page displays the rate, per second, at which I/O 
transfers take place, including paging write I/O (WIO), direct I/O (DIO), and 
buffered I/O (BIO). In the top pane, the summary is for all CPUs; in the lower 
pane, the summary is for one process. 
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When you double-click a data item under the DIO or BIO heading on the Node 
pane, or if you click the I/O tab, the Availability Manager displays, by default, the 
OpenVMS I/O Summary (Figure 3-12). 


Figure 3-12 OpenVMS I/O Summary 


11x) 


File View Fix Customize 


(Node Summary {CPU [Memory |¥O | Disk (Lock Contention [Cluster Summary | 


Percent of Threshold 


Current 


Paging Write I/O Rate 0.00 
Direct I/O Rate 4.30 
Buffered I/O Rate 1.26 
Total Page Faults 0.00 
Hard Page Faults 0.00 
System Page Faults 0.00 
Window Turn Rate 0.00 


PID Process Name _ | DIO Rate| BIO Rate| PIO Rate| Open Files | DIO Avail) BIO Avail |BYTLM Avail] Files Avail 


2040042D | DNS$ADVER 0.00 0.17 0.00 3 100 126 13936 97 


The graph in the top pane represents the percentage of thresholds for the types 
of I/O shown in Table 3-7. The table also shows the event that is related to each 
data item. For information about setting event thresholds, see Section 7.8.) 


Table 3-7 I/O Data Displayed 


Related 
Type of I/O /O Description Event 
Paging Write I/O Rate of write I/Os to one or more paging files. HIPWIO 
Rate 
Direct I/O Rate Transfers are from the pages or pagelets HIDIOR 
containing the process buffer that the system 
locks in physical memory to the system devices. 
Buffered I/O Rate Transfers are for the process buffer from an HIBIOR 
intermediate buffer from the system buffer pool. 
Total Page Faults Total of hard and soft page faults on the system, HITTLP 
as well as peak values seen during an Availability 
Manager session. 
Hard Page Faults Total of hard page faults on the system. HIHRDP 
System Page Faults Page faults generated by OpenVMS itself. HISYSP 


Window Turn Rate Number of times that the file extent cache had to WINTRN 
be refreshed. 


Current and peak values are listed for each type of I/O. Values that exceed 
thresholds set by the events indicated in the table are displayed in red on the 
screen. Appendix B describes OpenVMS and Windows events. 
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To the right of the graph, the following values are listed: 


Value Description 

Threshold Defined in Event Configuration Properties. 

Current Current value or rate. 

Peak Highest value or rate seen since start of data collection. 


The lower pane displays summary accumulated I/O rates on a per-process basis. 
The following data is displayed: 


Data Description 

PID Process identifier. A 32-bit value that uniquely identifies a process. 
Process Name Name of the current process. 

DIO Rate Direct I/O rate. The rate at which I/O transfers occur between the 


system devices and the pages or pagelets that contain the process 
buffer that the system locks in physical memory. 


BIO Rate Buffered I/O rate. The rate at which I/O transfers occur between the 
process buffer and an intermediate buffer from the system buffer pool. 


PIO Rate Paging I/O rate. The rate of read attempts necessary to satisfy page 
faults (also known as page read I/O or the hard fault rate). 

Open Files Number of open files. 

DIO Avail Direct I/O limit remaining. The number of remaining direct I/O limit 


operations available before the process reaches its quota. DIOLM quota 
is the maximum number of direct I/O operations a process can have 
outstanding at one time. 


BIO Avail Buffered I/O limit remaining. The number of remaining buffered I/O 
operations available before the process reaches its quota. BIOLM quota 
is the maximum number of buffered I/O operations a process can have 
outstanding at one time. 


BYTLM The number of buffered I/O bytes available before the process reaches 
its quota. BYTLM is the maximum number of bytes of nonpaged 
system dynamic memory that a process can claim at one time. 


Files Open file limit remaining. The number of additional files the process 
can open before reaching its quota. The FILLM quota is the maximum 
number of files that can be opened simultaneously by the process, 
including active network logical links. 


When you double-click a PID on the lower part of the I/O Summary page, the 
Availability Manager displays an OpenVMS Single Process, where you can click 
tabs to display specific data about one process. See Section 3.3 for more details. 


3.2.4.2 OpenVMS I/O Page/Swap Files 


Click I/O Page/Swap Files on the I/O page View menu to select this option. The 
Availability Manager displays an OpenVMS I/O Page/Swap Files page. The top 

pane displays the same information as that in the OpenVMS I/O Summary page 
(Figure 3-12). The lower pane contains the I/O Page/Swap Files pane shown in 

Figure 3-13. 
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Figure 3-13 OpenVMS I/O Page/Swap Files 


Host Node File Name Used % Used Tatal Reservable 
MAWK DISK$MAWK_PAGE: [SYSO.SYSEXE]PAGEFI... 84974) 42.49 199992 -41148 


OpenVMS VAX node MAWK - IO Summary - memory file view, 2 memory files (1 listed, 1 filtered out) 


The I/O Page/Swap Files pane displays the following data: 


Data Description 
Host Name Name of the node on which the page or swap file resides. 
File Name Name of the page or swap file. For secondary page or swap files, the 


file name is obtained by a special AST to the job controller on the 
remote node. The Availability Manager makes one attempt to retrieve 
the file name. 


Used Number of used blocks in the file. 

% Used Of the available blocks in each file, the percentage that has been used. 
Total Total number of blocks in the file. 

Reservable The number of reservable blocks in each page or swap file currently 


installed. Reservable blocks are blocks that might be logially claimed 
by a process for future physical allocation. A negative value indicates 
that the file might be overcommitted. Although a negative value is 
not an immediate concern, it indicates that the file might become 
overcommitted if physical memory becomes scarce. 


Notes 


OpenVMS Versions 7.3-1 and higher do not have a page or swap file 
“Reservable” field. The Availability Manager displays N/A in the field for 
these versions of OpenVMS. 


If events for secondary page and swap files are signaled before the Data 
Analyzer has resolved their file names from the file ID (FID), events such 
as LOPGSP display the FID instead of file name information. You can 
determine the file name for the FID by checking the File Name field in 
the I/O Page Swap Files page. The FID for the file name is displayed 
after the file name. 


3.2.5 Disk Summaries 


The Disk tab on the Node Summary page (Figure 3—4) allows you to display disk 
pages that contain data about availability, count, and errors of disk devices on 
the system. OpenVMS disk data displays differ from those for Windows nodes, as 
described in the following sections. 


On OpenVMS pages, the View menu lets you choose the following disk 
summaries: 


e Status Summary 
e Volume Summary 


Also, on the Disk Status Summary, you can double-click a device name to display 
a Single Disk Summary page. 
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3.2.5.1 OpenVMS Disk Status Summary 


To display the default disk page, the OpenVMS Disk Status Summary page 
(Figure 3-14), click the Disk tab on the OpenVMS Node Summary page 

(Figure 3-4). The Disk Status Summary page displays disk device data, including 
path, volume name, status, and mount, transaction, error, and resource wait 
counts. 


Figure 3-14 OpenVMS Disk Status Summary 
lo) x! 
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Device Name | HostPath | YolumeName | Status | Error | Trans | Mount | RWait 
$85$DKAZ00  ANDAZA $85$DKaZ00 | Mounted 4 1 1 Ole 
$85$DKA300 |ANDAZA | $85$DKA300 | Mounted 4 1 1 0 
$85$DKA400 | ANDAZA $85$DKa400 | Mounted 4 1) 1 oO 
$85$DKA800 |ANDAZA | $85$DKA800 | Mounted 74 1 1 a 
$85$DKA900 | ANDAZA | $85$DKA900 | Mounted 2 1) 1 oO 
$8 6$DKAO ANDA2A | $86$DKA0 [Mounted 1 1) 1 a 

| $86$DKA1 ANDA2ZA | $86$DKA1 ‘Mounted aya I oO 
$8 6$DKA2Z ANDA2ZA | $86$DKA2 Mounted 1 1 i QO 
$86$DKA3  |ANDAZA | $86$DKA3 | Mounted Q i} 1 0 
$8 6$DKA4 ANDA2ZA | $86$DKA4 Mounted a 1 1 a 
$8 6$DKA5 ANDAZA | $86$DKA5 [Mounted 1 1| 1 a 
$888$DKA200 |ANDA3A | $888$DKA200 | Mounted Oo 1 1 a 
DSAO ANDA2ZA | DSAO Mounted ao 1 1 a 
DSAl1 ANDAZA |OCALA OLD | Mounted Oo | 1 a 
psaA1999 ANDAZA |SPNKY_TST | Mounted Oo 1) 1 a 
DSA2 ANDAZA |COBRA3 S¥S | Mounted Oo 1 1 i 
DSA3 ANDAZA | OCALA_NSYS | Mounted oO 1 1 i 
DSA333 ANDA2ZA | DSA333 | Mounted Oo 1) 1 ao 
DSA4 ANDAZA | DISK$REGRES | Mounted QO 13 1 Olw 


OpenVMS Alpha node ANDA2A - Disk Survey - Status: 315 disks (177 listed, 138 filtered ... 


This summary displays the following data: 


Heading Description 

Device Name Standard OpenVMS device name that indicates where the device is 
located, as well as a controller or unit designation. 

Host Path Primary path (node) from which the device receives commands. 

Volume Name Name of the mounted media. 
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Heading Description 
Status One or more of the following disk status values: 
Alloc Disk is allocated to a specific user. 
CluTran Disk status is uncertain because of a cluster 
state transition in progress. 
Dismount Disk in process of dismounting; may be 
waiting for a file to close. 
Foreign Disk is mounted with the /FOREIGN 
qualifier. 
Invalid Disk is in an invalid state (most likely Mount 
Verify Timeout). 
MntVerify Disk is waiting for a mount verification. 
Mounted Disk is logically mounted by a MOUNT 
command. 
Offline Disk is no longer physically mounted in 
device drive. 
Online Disk is physically mounted in device drive. 
Shadow Set Disk is a member of a shadow set. 
Member 
Unavailable Disk is set to unavailable. 
Wrong Volume Disk was mounted with the wrong volume 
name. 
Wrtlck Disk is mounted and write locked. 
Error Number of errors generated by the disk (a quick indicator of device 
problems). 
Trans Number of in-progress file system operations for the disk. 
Mount Number of nodes that have the specified disk mounted. (These 


nodes must have the Data Collector installed and running to be 
participate in the mount count.) 


Rwait Indicator that a system I/O operation is stalled, usually during 
normal recovery from a connection failure or during volume 
processing of host-based shadowing. 


3.2.5.2 OpenVMS Single Disk Summary 
To collect single disk data and display the data on the Single Disk Summary, 
double-click a device name on the Disk Status Summary. Figure 3-15 is an 
example of a Single Disk Summary page. The display interval of the data 
collected is 5 seconds. 


Note that you can sort the order in which data is displayed in the Single Disk 
Summary page by clicking a column header. To reverse the sort order of a column 
of data, click the column header again. 
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Figure 3-15 OpenVMS Single Disk Summary 
=lo1x 


File Customize Help 


Node Status Errors | Trans | RiWait Free QLen_| OpRate 
ADEBUG |Mounted o 1) O| 8295687) 0.00 . 


8295687| 0.00 
8295687| 0.00 
~ =1/-0.01] -0.01)| 
8295687| 0.00 | 
8295687| 0.00 


OCALA Mounted 
QTV2 Mounted 


orv25  |** no data ** 
SABLZ Mounted 
SGRPOP | Mounted 


0 

ANDAIA | Mounted 4 1| _0| 8299926| 0.00) 0.00 
ANDA2ZA | Mounted 4 1 0) 8297739| 0.00) o.oo) 
ANDA3A |** no data **|  -1/ -1| -1 -1|-0.01] -0.01| 
AV7000 | Mounted oO 1 0) 8295687/ 0.00) 0O.00 
COBRA3 | Mounted i 1 O| 8295687/ 0.00) O.00 
EBJBO3 | Mounted Oo 1) 0] 8295687) 0.00) O.00) 
EBJBO? | Mounted oO 1| 0] 8295687| 0.00) 0.00||7 
ENQ Mounted Oo 1 0| 9295687| 0.00) O.00||7 
FLAM9? |** no data **| -1| -1) -1 -1-0.01/ -0.01) 
FREVAX | Mounted oO 1| | 8295687| 0.00| o.oo 
GWANG | Mounted O 1 0| 8295687| 0.00| 0.00] 
MNT1 Mounted Oo 1| O| 8313246| 0.00! o.oo 
MNTZ Mounted oO 1) _0| 8311050/ 0.00) 0.00 

i 1 oO i} 

0 1 o o. 

a] -i] -1 

0 1 0 

oO 1 oO 


o|o 
o|;o 
o|;o 


Single Disk <$85¢DKA200> 


This summary displays the following data: 


Data Description 

Node Name of the node. 

Status Status of the disk: mounted, online, offline, and so on. 

Errors Number of errors on the disk. 

Trans Number of in-progress file system operations on the disk (number of open 
files on the volume). 

Rwait Indication of an I/O stalled on the disk. 

Free Number of free disk blocks on the volume. 

QLen Average number of operations in the I/O queue for the volume. 

OpRate Each node’s contribution to the total operation rate (number of I/Os per 


second) for the disk. 


3.2.5.3 OpenVMS Disk Volume Summary 
By using the View option on the Disk Status Summary page (Figure 3-14), you 
can select the Volume Summary option to display the OpenVMS Disk Volume 
Summary (Figure 3-16). This page displays disk volume data, including path, 
volume name, disk block utilization, queue length, and operation rate. 
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Figure 3-16 OpenVMS Disk Volume Summary 


2io}xi 


File View Fix Customize 


(Node Summary (CPU [Memory {VO |(Disk| Lock Contention {Cluster Summary | 


Device Name} HostPath |Volume Na... 


$4$DU... |AFFHST | ALPHA... 
$4$DU... |AFFHST | COMMONS 


$4$DU... |AFFHSJ | KITS 
S4$DU... |AFFHST | QUORUM 
$4$DU... |AFFHSJ | USER1 
$4$DU... |AFFHST | USER2 


Free | Queue | OpRate [Physical Size| Volume Size |Volume Limit 
1380258 | 0.00) 0.06) 8378028) 8378028) 8589312 
2707353 | 0.00 0.00) 8378028 8378028 | 8589312 
3591369 | 0.00 0.00 | 8378028) 8378028) 8589312 

1565928) 0.00) 0.14) 8378028) 8378028) 8589312 
1180332 | | 0.00, 8378028) 8378028 | 8589312 | 
1018413 | a | 8378028) 8378028) 8589312 


OpenVMS Alpha node AFFS91 - Disk Survey - Volumes: 11 volumes (6 listed, 5 filtered out) 


The Disk Volume Summary page displays the data described in the following 
table. (The last two columns, Volume Size and Volume Limit, are displayed only 
on OpenVMS Version 7.3-2 and later systems.) 


Data 


Description 


Device Name 


Host Path 
Volume Name 
Used 

% Used 


Free 
Queue 


OpRate 


Physical Size 


Volume Size 


Volume Limit 


Standard OpenVMS device name that indicates where the device is 
located, as well as a controller or unit designation. 


Primary path (node) from which the device receives commands. 
Name of the mounted media. 
Number of blocks on the volume that are in use. 


Percentage of the number of volume blocks in use in relation to the 
total volume blocks available. 


Number of blocks of volume space available for new data from the 
perspective of the node that is mounted. 


Average number of I/O operations pending for the volume (an 
indicator of performance; less than 1.00 is optimal). 


Operation rate for the most recent sampling interval. The rate 
measures the amount of activity on a volume. The optimal load is 
device specific. 


Total number of blocks on the current physical disk device. This is 
the "Total Blocks" field of the $SsHOW DEVICE/FULL display 


Current number of blocks available for file allocation. This is the 
"Logical Volume Size" field of the $SHOW DEVICE/FULL display. 
(For more information, see $$ET VOLUME/SIZE.) This column is 
displayed only on OpenVMS Version 7.3-2 and later systems. 


Maximum number of blocks the volume can reach using Dynamic 
Volume Expansion. This is the "Expansion Size Limit" of $SHOW 
DEVICE/FULL display. (For more information, see $SET 
VOLUME/LIMIT.) This column is displayed only on OpenVMS 
Version 7.3-2 and later systems. 


If the Availability Manager detects that a disk volume size has increased, an 
VLSZCH event is signalled: 


AFFS55 Volume size of device $8SDKA200 (OPAL-X9U6) has changed 


Node 
name 


Device Volume 
name name 
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3.2.5.4 Windows Logical and Physical Disk Summaries 
On Windows nodes, the View menu lets you choose the following summaries: 


e Logical Disk Summary 
e Physical Disk Summary 


Windows Logical Disk Summary 


A logical disk is the user-definable set of partitions under a drive letter. The 
Windows Logical Disk Summary displays logical disk device data, including path, 
label, percentage used, free space, and queue statistics. 


To display the Logical Disk Summary page, follow these steps: 


1. Double-click a node name in the Node pane to display the Windows Node 
Summary. 


2. Click the Disk tab on the Windows Node Summary. 
The Availability Manager displays the Windows Logical Disk Summary page 
(Figure 3-17). 


Figure 3-17 Windows Logical Disk Summary 


Liu] Node AFFC36 -/5/ x| 


File View Customize 


|APFC36 
E: |AFFC36 
Total APFC36) 


Windows NT Intel node AFFC36 - Logical Disk Summary 


This summary displays the following data: 


Data Description 

Disk Drive letter, for example, c:, or Total, which is the summation of 
statistics for all the disks. 

Path Primary path (node) from which the device receives commands. 

Label Identifying label of a volume. 

Type File system type; for example, FAT or NTFS. 

% Used Percentage of disk space used. 

Free Amount of free space available on the logical disk unit. 


Current Queue 


Average Queue 


Number of requests outstanding on the disk at the time the 
performance data is collected. It includes requests in progress 
at the time of data collection. 


Average number of both read and write requests that were queued 
for the selected disk during the sample interval. 


Transfers/Sec Rate of read and write operations on the disk. 

KBytes/Sec Rate data is transferred to or from the disk during write or read 
operations. The rate is displayed in kilobytes per second. 

% Busy Percentage of elapsed time that the selected disk drive is busy 


servicing read and write requests. 


3-24 Getting Information About Nodes 


Getting Information About Nodes 
3.2 Node Data Pages 


Windows Physical Disk Summary 


A physical disk is hardware used on your computer system. The Windows 
Physical Disk Summary displays disk volume data, including path, label, queue 
statistics, transfers, and bytes per second. 


To display the Windows Physical Disk Summary, follow these steps: 
1. Click the View menu on the Windows Logical Disk Summary. 
2. Click the Physical Disk Summary menu option. 


The Availability Manager displays the Windows Physical Disk Summary page 
(Figure 3-18). 


Figure 3-18 Windows Physical Disk Summary 


[1] Node AFFC53 L—|5} x! 
File View Customize 


(Node Summary | CPU {Memory | Bisk| 


nis Path Surtent Queue Average Queue) Transfers/Sec| KBytes/Sec | _% Busy | % Read Busy | % Write Busy 


AFFCS3 0.00 0.20 0.70 0.03 0.00 0.03 -& 


a AFFCS3 


; 0.00 0.20 0.70 | 0.03 0.00 0.03 


{> 


Windows NT Intel node AFFC53 - Physical Disk Summary 


This page displays the following data: 


Data Description 

Disk Drive number, for example, 0, 1, 2 or Total, which is the summation 
of statistics for all the disks. 

Path Primary path (node) from which the device receives commands. 


Current Queue 


Average Queue 
Transfers/Sec 
KBytes/Sec 

% Busy 

% Read Busy 


% Write Busy 


Number of requests outstanding on the disk at the time the 
performance data is collected; it includes requests in service at 
the time of data collection. 


Average number of read and write requests that were queued for 
the selected disk during the sample interval. 


Rate of read and write operations on the disk. The rate is displayed 
in kilobytes per second. 


Rate bytes are transferred to or from the disk during read or write 
operations. The rate is displayed in kilobytes per second. 


Percentage of elapsed time the selected disk drive is busy servicing 
read and write requests. 


Percentage of elapsed time the selected disk drive is busy servicing 
read requests. 


Percentage of elapsed time the selected disk drive is busy servicing 
write requests. 


3.2.6 OpenVMS Lock Contention 


To display the OpenVMS Lock Contention page, click the Lock Contention tab on 
the OpenVMS Node Summary page (Figure 3—4). For all the nodes in the group 
you have selected, the Lock Contention page displays each resource for which a 
lock contention problem might exist. 
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Note 


Lock contention data is accurate only if every node in an OpenVMS 
Cluster environment is in the same group. You might lose accuracy if you 
do not have all the nodes of a cluster in one group. 


3.2.6.1. Lock Contention Page in Decoded Format 


Figure 3-19 shows a sample Lock Contention page containing resource names in 
decoded format, which is the default. 


Figure 3-19 OpenVMS Lock Contention (Decoded Format) 


HuJNode AFFST EE eee =/o/ x! 
File View Fix Customize 
(Node Summary CPU |Memory |(/O | Disk |(Lack Contention 
Resource Name Master Node Parent Resource Name Duration Gr/tv/We/St Stanu: 
© P aracHEsACCEPT_ooo00z9a AFFSe 0 01:21:29 DIRENY 
© P aPACHE$ PARENT_0000023D AFFS8 0 01:21:23 DIRENT 
@ P runisimsny ALIVE. 20600054. ..... arrsez 0 01:21:22 eee 
© P AN; JBC_ALIVE_01 AFFSS1 Q Mgr file for #4#DUA310(6498,2,0) 0 Ol:21:24 VALID 
@® P mans JBC_ALIVE_01 AFFS6 Q Mgr file for AFFS6¢DKA400(6796,3,0) © 01:21:22 VALID 
© 7 oman; 3Bc_ALIVE_o1 AFFS? Q Mgr file for AFFS7sDKA100(6430,4.0) © 01:21:22 VALID 
© YP mans JBC_ALIVE_01 AFFSSZ Q Mgr file for #4¢DUA130(466,1,0) © 01:21:22 VALID 
© (D> MAN; 3BC_ALIVE_01 AFFSS Q Mgr file for $2¢DIAz(3132,5,0) 0 01:21:22 VALID 
©  QmansJBC_ALIVE_01 AFFS23 Q Mgr file for AFFS23¢DKD30(6648,7,0) 0 01:21:28 VALID 
@® P Quan? JBC_ALIVE_01 AFFS1Z Q Mgr file for AFFS12¢DKAZ0(6194,4,0) 0 01:21:22 VALID 
© > omawsspc_aLive_o1 AFFSe Q Mgr file for ¢8sDKA0(5274,30,0) 0 01:21:23 VALID 
© P RECsMASTER_LOCK AFFS12 O 01:21:22 DIREN" 
© P REGsMASTER_LOCK AFFSss. 0 01:21:23 DIREN’ 
© P REGsMASTER_LOCK AFFS7? © 01:21:22 eet 
REG$MASTER_LOCK 
RSB FFFFFFFF7FECSASD 
Node AFFST 
Parent 
Duration 0.01:21:22 
Status DIRENTRY VALID 
ValBIk 0000: 53464641 00000117 ...AFFS 
dump 0008: 00000001 oo000097 7...... 
E i} 


OpenVMS Alpha node AFFS11 - Lock Contention 


(You can display a tooltip similar to the one shown in Figure 3-19 by holding the 
cursor on a resource line. See the Note in the introduction to this chapter for 
further details.) 


By selecting the View menu (on the Lock Contention page), followed by the 
Resource names menu item, you can choose to display the resource name and 
parent resource name in either of two formats: 


e Raw format (the format that SDA uses) 
e Decoded format (the default format) 


Figure 3-19 displays the resource names in decoded format. (The Availability 
Manager decodes common resource names.) 


The Lock Contention page displays the data described in Table 3-8. Numbered 
lines correspond to lines or items of data in the Lock Contention Log 
(Example 3-1). 
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Table 3-8 Data on the OpenVMS Lock Contention Page 


Lock 

Log 

Reference 
Number Data 


Description 


1 Resource Name 


Master Node 


Parent Resource 


4 Duration 
5 Gr/Cv/Wt/St 
6 Status 


Resource name associated with the $ENQ system service 
call. 


Node on which the resource is mastered. 


Name of the parent resource. No name is displayed when a 
parent resource does not exist. 


Time elapsed since the Availability Manager first detected 
the contention situation. 


Total number of locks in each of four states. Numbers for 
these states appear only when you are collecting lock data. 
The states are: 


e Granted 

e Converting 
e §©Waiting 

e =©Stalled 


Stalled indicates one of several states whenever a lock is 
waiting for a response from another node in the cluster. 


Status of the lock. See the $ENQW description of flags in 
the HP OpenVMS System Services Reference Manual. 


The tooltip that is displayed when you hold the cursor over a line of data in 
Figure 3-19 contains the data described in Table 3-8, as well as the information 


described in Table 3-9. 


Table 3-9 Lock Contention Tooltip Data 


Reference 

Number Data Description 

7 RSB Address of the Resource Block 

8 ValBlk dump Resource Value Block dump in standard OpenVMS dump 


format 


3.2.6.2 Lock Contention Page in Raw Format 


Figure 3-20 shows the Lock Contention page with resource name data displayed 
in raw format. It also shows the tooltip that is displayed when you hold the 


cursor over a line of data. 
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Figure 3-20 OpenVMS Lock Contention (Raw Format) 


s1DIx) 

File View Fix Customize elt 
Node Summary {CPU |Memory |\/O | Disk |(Lock Contention 

[A] _ Resource Name Master Node Parent Resource Name Duration Gr/Cv/ut/St Statud 
© / aPACHE?ACCEPT_00000Z34 AFFSs i - - DIREN’ 
© P APACHE#PARENT_0000023D AFFSS - DIREN’ 
© > PuRKs LMSRV_ALIVE_ZO6000BA.....- AFFSSZ - = DIRENT 
© P Mans JBC_ALIVE_01 AFFSS51 QMANSMSR_#4¢DUA310...- - - VALID 
© * OMAN? JBC_ALIVE_01 AFFS6 OMANSMSR_AFFS6$DKA400. _ af VALID 
o a QMAN$ JBC_ALIVE_01 AFFS? QMAN¢MSR_AFFS7¢DKA100_ = a VALID 
© > QuAN; JBC_ALIVE_01 AFFSS52 QMANSMSR_¢4¢DUA130 - ad VALID 
© / OMAN? JBC_ALIVE_01 AFFSS OMANSMSR_¢Z¢DIA2.- - - VALID 
© P oman; sBC_ALIVE_o1 AFFS23 QNANSMSR_AFFS23¢DKD300 - -  vALID 
© > OMAN? JBC_ALIVE_01 AFFS1Z QMANSMSR_AFFS12¢DKAZO0. = a VALID 
© P ouaNs3BC_ALIVE_o1 AFFS8 QNAN:MSR_#9sDKA0 = - VALID 
© P REGSMASTER_LOCK AFFS12 = = DIRENT, 
@® P REGsMASTER_LOCK AFFss > - DIRENT, 
© P REGFMASTER_LOCK AFFS? - DIRENT) 


REG$MASTER_LOCK 

RSB FFFFFFFF7FECSASO 

Node AFFST 

Parent 

Duration 001:24:24 

Status DIRENTRY VALID 

VaIBIk 0000: 53464641 00000117 ...AFFS 
dump 0008: 00000001 00000037 7....... 


4 


OpenVMS Alpha node AFFS11 - Lock Contention 


In Figure 3-20, notice that a period is substituted for each unprintable character 
in the Resource Name and Parent Resource Name fields. 


3.2.6.3 Lock Block Data 


When you click the handle that precedes any line of resource data, the 
Availability Manager displays the lock block data that is shown in Figure 3-21 
and Figure 3-22. 


Figure 3-21 OpenVMS Lock Block Data 


[iy] Node ANDA1A 


File View Fix Custornize 


2/0) x! 


‘Node Summary [CPU {Memory [W/O | ‘Disk | Lock Contention [Cluster Summary | 

| Resource Name Master Node Parent Resource Name Duration Gr/Cv/e /se 

© / 10cENs_2 WILD4 0 22:05:04 = 9 ------- 

@ P* locENs_22222 SPNKY2 0 22:04:44 1sos1s0 
Node State Process Name LID Mode Flags 
SPNKY2 Granted System Lock oz0003DA IX SYSTEM NOQUOTA CVTSYS NODLCKW) 
SPNKY3 Waiting System Lock ozooo3Ee EX SYSTEM NOQUOTA CVTSYS NODLCKW) 

© P 10cENs_32767 EBJBO3 0 22:04:23 

© P IOGENs_444 ANDALA 0 22:05:15 

© P I0GEN$_s00 SABLZ O 22:04:43 

© P 10cENs_742 SGRPOP 0 22 

© P 10cEns_s42 SGRPOP 0 22 

© P 10cENs_8s ANDALA 0 22 

© P IOcENs_s6 ANDALA 0 22: 

© /P I0cENs_sss ANDASA 0 22: 

@ 7 WANs IBC_ALIVE_01 WILD4 Q Mgr file for DSA4(533,46192,0) 0 00: 1s0/1/0 
Node State Process Name LID Mode Duration Flags 
GUANG Granted JOB_CONTROL 14007414 IX NOQUEUE 
cUANG = Waiting QUEUE_MANACER oROOCEEC cr NODLCKW 

@ P wRITER WILD3 Audit Srv Jynl DISK#REGRES(6748,25,0) 0 00:43:51 3/15/0/0 
Node State Process Name LID Mode Flags 
ANDASA Granted AUDIT_SERVER 1900F703 Pu CONVERT NODLCKW NODLCKELK 
ADEBUG Convert AUDIT_SERVER O600FSAS NL/PW CONVERT NODLCKW NODLCKELK 
ANDAZA Convert AUDIT_SERVER 1000F47B NL/?Pw CONVERT NODLCKW NODLCKELK 
SGRPOP Convert AUDIT_SERVER 0700E15F NL/?PU CONVERT NODLCKW NODLCKELK 
SPNKY Convert AUDIT_SERVER 0900F472 NL/?PU CONVERT NODLCKW NODLCKBLK 
EBJBO3 Convert AUDIT_SERVER 3000F36E NL/?Pu CONVERT NODLCKW NODLCKBLK 
FLAMS7 © Convert AUDIT_SERVER OBOOEFIC NL/PU CONVERT NODLCKW NODLCKBLK 
OCALA AUDIT_SERVER 1200F39D VALBLK CONVERT NODLCKW NODLCK|~ 


OpenVMS Alpha node ANDA1A - Lock Contention 
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Figure 3-22 OpenVMS Lock Block Data (Retry Stalled State) 


| ]Node KOINE a 


File View Fix Customize 


(Node Summary {CPU [Memory [VO | Disk | Lack Contention |Cluster Summary | 


iB Resource Name 
@ /* DTIs SYSTEMS KOINE 
Node State 


KOINE Granted 
KOINE Waiting 


JOB_CONTROL 


Process Name 


KOINE 


KOINE3 Retry TP_SERVER 
@ /* MANS JBC_ALIVE_01 
Node State Process Name 


QUEUE_MANAGER 


Master Node Parent Resource Name Duration 

KOINES O 00:09:23 
LKID Mode Duration Flags 
36000878 NL/EX 0 00:08:13 SYSTE 
Q Mgr file for $1#DKA300(2054,2,0) O 00:11:27 
LEID Mode Duration Flags 
1EO0034A EX 0 00:11:13 NOQUE’ 


01000358 cR 0 00:11:13 NODLC: 


OpenVMS Alpha node KOINE - Lock Contention 


The lock block data in these two figures includes additional lock information 
under the headings shown in Table 3-10. Numbered lines correspond to lines or 
items of data in the Lock Contention Log (Example 3-1). 


Table 3-10 Lock Block Data 


Description 


Reference 

Number Data 

9 Node 

10 State 

11 Process 
Name 

12 LKID 


Node name on which the lock is granted. 
One of the following: 


Color Meaning 

Green Granted 

Yellow Converting 

Pink Waiting 

Pale grey Stalled states that are visible: 


SCSWAIT: A transient state indicating that 
a lock message has been sent to the node 
with the master lock and a response is 
awaited. 


RETRY: A transient state seen only under 
error conditions that require that a lock 
message be resent. This can occur if the 
node to which a lock message was sent goes 
down before a response from it is received 
or if resources for sending a message cannot 
be allocated. 


Name of the process that owns the blocking lock. 
Lock ID value (which is useful with SDA). 


(continued on next page) 
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Table 3-10 (Cont.) Lock Block Data 


Reference 

Number Data Description 

13 Mode One of the following modes in which the lock is granted or 
requested:! 

CR Concurrent read Grants read access and allows 
resource sharing with other 
readers and writers. 

CW Concurrent Grants write access and allows 

write resource sharing with other 
groups. 

EX Exclusive Grants write access and prevents 
resource sharing with any other 
readers or writers. 

NL Null Grants no access; used as 
an indicator of interest or a 
placeholder for future lock 
conversion. 

PR Protected read Grants read access and allows 
resource sharing with other 
readers, but not writers. 

PW Protected write Grants write access and prevents 
resource sharing with any other 
readers or writers. 

If one mode is displayed, it is the Granted mode; if two modes 

are displayed, the first is the Granted mode and the second is 

the Converting mode. 

14 Duration Length of time the lock has been in the current queue since the 
console application found the lock. 

15 Flags Flags specified with the $ENQW request. See the $ENQW 


entry in HP OpenVMS System Services Reference Manual. 


1Descriptions are from Goldenberg, Ruth, and Saravanan, Saro, OpenVMS AXP Internals and Data 
Structures, Version 1.5, Digital Press, 1994. 


To interpret the information displayed on the OpenVMS Lock Contention 
page, you need to understand OpenVMS lock management services. For more 
information, see the HP OpenVMS System Services Reference Manual. 


3.2.6.4 Lock Block Log File 


Example 3-1 contains an excerpt of a lock block log file. You can find a lock block 
log file in either of the following locations: 


System File Name Location 

Windows AvailManLock.log Installation directory 

OpenVMS AvailManLock.log, Directory to which AMDS$AM_LOG logical 
prefaced by points 


AMDS$AM_LOG 


Numbers preceding lines or items of data in Example 3-1 correspond to numbered 
lines in Table 3-8, Table 3-9, and Section 3.2.6.3. Table 3-11 contains lines or 
items of data in a lock block log file that are not described in the other tables in 
this section. 
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Table 3-11 Additional Data in the Lock Block Log File 


Lock 
Log 
Reference 
Number Data from Example Description 
16 Reason for logging In the example, the reason for logging is "the number 
of locks has changed." Other reasons include the "initial 
discovery of resource contention" or "lock data collection 
has been turned on." 
17 GGMODE/CGMODE Lock has been Granted/Lock is Converting. 
18 Resource Name OpenVMS style of Resource Name dump. 
Dump 
19 RDB global Decoded Resource Name. 
database name 
resource 
20 Parent Resource OpenVMS style of Parent Resource Name dump. 
Name Dump 
21 RDB global Decoded Parent Resource Name. 
database name 
resource 
22 Lock data is being The handle preceding a line of lock data has been turned. 
collected 
23 Master copy info. Remote node that contains the master copy of the lock. 
Remote Node If “Local Copy,” only one node is interested in the lock. 
24 Master copy info. Lock ID of remote node that contains the master copy of 


Remote Lock ID 


the lock. 


Example 3-1 Lock Block Log File 


KRKEKKE KK KKK KKK RK KKK KKK KKK KKK KEKE KKK KERR KEK KKK KKKKKKKKK 


11-Nov-2003 14:54:13.656 


Time: 


16)Reason for logging: 


2) Master Lock Node: 

1) Resource Name: 

17) GGMODE/CGMODE: 

6 Status: 

7 RSB Address: 

18) 
0000: 

8 Value Block Dump: 
0000: 
0008: 


19) Rdb Remote monitor resource 


3) Parent Resource Name: 
7) RSB Address: 
20) 


0000: 


0018: 


00000000 00000000 
00000000 00000000 


00004400 0000DD1C 
0008: 4F4F5245 44560200 
0010: A0002020 20202054 T 

00 00000237 7.... 


Number of locks has changed 


ALTOS 


EX/EX 
VALID 
FFFFFFFE. 889F1580 


elses 


2 


Resource Name Dump (includes initial count byte): 
000200 00004906 


Y...D....VDEROOT Pa ere 


FFFFFFFE.8847DB80 


iste Dd... 
. . VDEROO 


Resource Name Dump (includes initial count byte): 


(continued on next page) 
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Example 3-1 (Cont.) Lock Block Log File 


8) Value Block Dump: 
0000: 00000000 00000000 
0008: 00000000 00000000 


21) Rdb global database name resource 
Disk volume name: VDEROOT 


FID for file: (14240,2,0) 
22) Lock data is being collected 


5) Granted lock count: al 

5) Conversion lock count: 0 

5) Waiting lock count: 4 

5) Stalled lock count: 0 
10) 9) 11) 12) 13) Master copy info: 15) 
Lock Node Process Process Lock Gr/Cv Remote Remote Flags 
State PID Name ID Mode Node Lock ID 

23) 24) 

Granted ALTOS 28E00441 RDMS_MONITOR70 04014B37 EX (Local copy) NQUE SYNC SYS 
Waiting ALTOS 2880023F RDMS_MONITOR70 4C0065B5 PR TSAVO 32005001 SYNC SYS NDLW 
Waiting ALTOS 00000000 (EPID=28A0023D) 4Cc0144c4 PR ETOSHA 74005536 SYNC SYS NDLW 
Waiting ALTOS 28C00448 RDMS_MONITOR70 1D0144A3 PR CHOBE 77005906 SYNC SYS NDLW 
Waiting ALTOS 28E026C3 VDESKEPT126A3 01014B2D PR (Local copy) SYS NDLW 


KKK KER KK KKK KER KKK KK KER KKK KKK KEK KKK KKK KKK KK KKKKKKKKKK 


3.3 OpenVMS Single Process Data 


When you double-click a row in the lower part of an OpenVMS Mode Details 
(Figure 3-7), OpenVMS CPU Process Summary (Figure 3-8), Memory 
(Figure 3-10), or I/O (Figure 3-12) pages, the Availability Manager displays 
the first of several OpenVMS Single Process pages. 


Alternatively, you can right-click a row and select “Display...”. The “View” menu 
item contains three display options, shown in Figure 3-23. 


Figure 3-23 Single Process Window 


PESOS Single Process <_RTA2:> (DETACHED) a 


File | View Fix Customize 


=/5) x! 


at—* oe aT 
— — ExecutionRates | 
OF he mw vertical Grid 
i fa 1° —————— 
Of - LO Horizontal Grid RTAZ: 
'@s- HILDE 
Account: DEBUG 
uic: [14,253] 
PID: 23E00135 
Owner ID: oooo00000 
PC: Not Available 
Ps: Not Available 
Priority: 6/4 
State: HIB 
CPU Time: O 00:23:22.55 
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Explanations of the choices in the “View” menu are the following: 
e Tabs: individual tabs for each Single Process display: 

— Process Information 

— Working Set 

— Execution Rates 

— Process Quotas 

— Wait States 

— Job Quotas 

— RAD Counters 


e Vertical Grid: all of the Single Process displays combined in one vertically- 
oriented grid 


e Horizontal Grid: all of the Single Process displays combined in one 
horizontally-oriented grid 


The following sections describe the individual tabs or sections of the vertical or 
horizontal grids. 


Each section refers to the vertical grid display shown in Figure 3-24. 
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Figure 3-24 Single Process Vertical Grid Display 


||, PESOS Single Process <_RTA4:> (DETACHED) a 


File View Fix Customize 


Execution Rates 


Process Information 
Process Name: RTA4: 
Username: HILDE 
Account: DEBUG 
uic: [14,253] 
PID: 23E00138 
Owner ID: 00000000 
PC: Not Available 
Ps: Not Available 
Priority: 4/4 
State: HIB 
CPU Time: O 00:22:41.04 
Working Set 
WS Global Pages: 896 
WS Private Pages: 37920 
WS Total Pages: 38816 
WS Size: 52384 
WS Default: 16384 
WS Quota: 32768 
WS Extent: 522240 
Images Activated: 1235 
Mutexes Held: o 
Wait States 
Compute: 
Memory 
Direct W/O: 
Buffered 1/0: 
Control: 
Quotas: 
Explicit: 


CPU: 22 
Direct W/O: 41. 
Buffered 1/0: 70. 
Paging I/O: 128. 
Page Faults: 406. 
Process Quotas 
Current Limit 
Direct 1/0: 0 150 
Buffered 1/0: o 150 
ASTs: 8 300 
CPU Time: O 00:22:41.04 No Limit 
Job Quotas 
Current Limit 
Open file count: 14 250 
Paging file count: 119648 750000 
Enqueue count: 3 2000 
TQE count: 4 20 
Subprocess count: 0 60 
Byte count: 3008 128000 
RAD Counters 
Current Total 
Home RAD = 0 
Private 2746 2746 
Shared QO 0 
Global 54 54 


Current image: $1$DGA3890:(SYSO.SYSCOMMON JAVA$ 142 BINUJAVASJAVA. EXE; 1 


3.3.1 Process Information 


Table 3-12 describes the Process Information data shown in Figure 3-24. 


The data on this page is displayed at the default intervals shown for Single 
Process Data on the Data Collection Customization page. 


Table 3-12 Process Information 


Data 


Description 


Process name 
Username 
Account 


UIC 


Name of the process. 
User name of the user who owns the process. 
Account string that the system manager assigns to the user. 


User identification code (UIC). A pair of numbers or character 


strings that designate the group and user. 


PID 
Owner ID 


Process identifier. A 32-bit value that uniquely identifies a process. 


Process identifier of the process that created the process displayed 


on the page. If the PID is 0, then the process is a parent process. 
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Process Information 


Data Description 

PC Program counter. 
On OpenVMS Alpha systems, this value is displayed as 0 because 
the data is not readily available to the Data Collector node. 

PS Processor status longword (PSL). This value is displayed on VAX 
systems only. 

Priority Computable and base priority of the process. Priority is an integer 
between 0 and 31. Processes with higher priority are given more 
CPU time. 

State One of the process states listed in Appendix A. 

CPU Time CPU time used by the process. 


3.3.2 Working Set 


Table 3-13 describes the Working Set data shown in Figure 3-24. 


Table 3-13 Working Set 


Data 


Description 


WS Global Pages 
WS Private Pages 


WS Total Pages 
WS Size 


WS Default 


WS Quota 


WS Extent 


Shared data or code between processes, listed in pages (measured in 
pagelets). 


Amount of accessible memory, listed in pages (measured in 
pagelets). 


Sum of global and private pages (measured in pagelets). 


Working set size. The number of pages (measured in pagelets) of 
memory the process is allowed to use. This value is periodically 
adjusted by the operating system based on analysis of page faults 
relative to CPU time used. Increases in large units indicates that 
a process is taking many page faults, and its memory allocation is 
increasing. 


Working set default. The initial limit of the number of physical 
pages (measured in pagelets) of memory the process can use. 
This parameter is listed in the user authorization file (UAF); 
discrepancies between the UAF value and the displayed value 
are due to page/longword boundary rounding or other adjustments 
made by the operating system. 


Working set quota. The maximum amount of physical pages 
(measured in pagelets) of memory the process can lock into its 
working set. This parameter is listed in the UAF; discrepancies 
between the UAF value and the displayed value are due to 
page/longword boundary rounding or other adjustments made 
by the operating system. 


Working set extent. The maximum number of physical pages 
(measured in pagelets) of memory the system will allocate for 

the process. The system provides memory to a process beyond 

its quota only when it has an excess of free pages and can be 
recalled if necessary. This parameter is listed in the UAF; any 
discrepancies between the UAF value and the displayed value are 
due to page/longword boundary rounding or other adjustments made 
by the operating system. 


(continued on next page) 
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Table 3-13 (Cont.) Working Set 


Data Description 
Images Activated Number of times an image is activated. 
Mutexes Held Number of mutual exclusions (mutexes) held. Persistent values 


other than zero (0) require analysis. A mutex is similar to a lock 
but is restricted to one CPU. When a process holds a mutex, its 
priority is temporarily increased to 16. 


3.3.3 Execution Rates 
Table 3-14 describes the Execution Rates data shown in Figure 3-24. 


Table 3-14 Execution Rates 


Data Description 


CPU Percent of CPU time used by this process. The ratio of CPU time to 
elapsed time. 


Direct I/O Rate at which I/O transfers take place from the pages or pagelets 
containing the process buffer that the system locks in physical memory 
to the system devices. 


Buffered I/O Rate at which I/O transfers take place for the process buffer from an 
intermediate buffer from the system buffer pool. 


Paging I/O Rate of read attempts necessary to satisfy page faults. This is also 
known as page read J/O or the hard fault rate. 


Page Faults Page faults per second for the process. 


3.3.4 Quotas 
Table 3-15 describes the Process Quotas data shown in Figure 3-24. 


Note that when you display the SWAPPER process, no values are listed in this 
section. The SWAPPER process does not have quotas defined in the same way as 
other system and user processes do. 


Table 3-15 Quotas 


Data Description 

Direct I/O The current number of direct I/Os used compared with the limit 
possible. 

Buffered I/O The current number of buffered I/Os used compared with the possible 
limit. 

ASTs Asynchronous system traps. The current number of ASTs used 


compared with the possible limit. 


CPU Time Amount of time used compared with the possible limit. "No Limit" is 
displayed if the limit is zero. 


3.3.5 Wait States 
Table 3-16 describes the Wait States data shown in Figure 3-24. 


In the graph, “Current” refers to the percentage of elapsed time each process 
spends in one of the computed wait states. If a process spends all its time waiting 
in one state, the total gradually reaches 100%. 
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How Wait States are Calculated 

The wait state specifies why a process cannot execute, based on calculations 
made on collected data. Each value is calculated over an entire data collection 
period of approximately 2 minutes. The graph shows, over this period of time, 
the percentage of time a process spends in each wait state. Each value is an 
exponential average that approximates a moving average. A more detailed 
explanation follows. 


When monitoring of a single process starts, all wait state values are zero. When 
the system periodically checks the process, the system first subtracts 10% from 

each value. It then adds a value of 10 to the wait state the process is currently 

in, if any. 


For example, at the start, if a process is found to be in the Control wait state, the 
graph immediately registers 10 for Control. If the process is still in the Control 
wait state the next time it is checked, the graph shows Control at 19. This value 
is 90% of the original 10 (or 9), plus 10 (the value currently being added). 


The next time the process is checked, if it is found to be in the Buffered I/O wait 
state, Buffered I/O is set to 10 and Control is set to 17 (approximately 90% of the 
previous value of 19). 


The following time the process is checked, if it is not in a wait state at all, 
Buffered I/O is set to 9 (90% of 10), and Control is set to 15 (90% of 17). 


Appendix A contains descriptions of wait states. 


Table 3-16 Wait States 


Data Description 

Compute Average percentage of time that the process is waiting for CPU time. 
Possible states are COM, COMO, or RWCAP. 

Memory Average percentage of time that the process is waiting for a page fault 


that requires data to be read from disk; this is common during image 
activation. Possible states are PFW, MWAIT, COLPG, FPG, RWPAG, 
RWNPG, RWMPE, or RWMPB. 


Direct I/O Average percentage of time that the process waits for data to be read 
from or written to a disk or tape. The possible state is DIO. 


Buffered I/O Average percentage of time that the process waits for data to be read 
from or written to a slower device such as a terminal, line printer, 
mailbox, or network traffic. The possible state is BIO. 


Control Average percentage of time that the process is waiting for another 
process to release control of some resource. Possible states are CEF, 
MWAIT, LEF, LEFO, RWAST, RWMBX, RWSCS, RWCLU, RWCSV, 
RWUNK, or LEF waiting for an ENQ. 


Quotas Average percentage of time that the process is waiting because the 
process has exceeded some quota. Possible states are QUOTA or 
RWAST_QUOTA. 


Explicit Average percentage of time that the process is waiting because the 
process asked to wait, such as a hibernate system service. Possible 
states are HIB, HIBO, SUSP, SUSPO, or LEF waiting for a TQE. 
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3.3.6 Job Quotas 


Table 3-17 describes the Job Quota data shown in Figure 3-24. 


Table 3-17 Job Quotas 


AUTHORIZE 

Data Description Quota 

Open File Count Current number of open files compared with the FILLM 
possible limit. 

Paging File Count Current number of disk blocks in the page file PGFLQUOTA 
that the process can use compared with the 
possible limit. 

Enqueue Count Current number of resources (lock blocks) queued ENQLM 
compared with the possible limit. 

TQE Count Current number of timer queue entry (TQE) TQELM 
requests compared with the possible limit. 

Subprocess Count Current number of subprocesses created PRCLM 
compared with the possible limit. 

Byte Count Current number of bytes used for buffered I/O BYTLM 


transfers compared with the possible limit. 


3.3.7 RAD Counters 


Table 3-18 describes the RAD Counters data shown in Figure 3-24. The RAD 
(Resource Affinity Domain) Counters data page is displayed for Alpha and 164 


systems. 


Table 3-18 RAD Counters Data 


Data Description 

Private Number of process private pages on RAD 0. 
Shared Number of process shared pages on RAD 0. 
Global Number of global pages on RAD 0. 
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Displaying OpenVMS Cluster Data 


The Availability Manager displays a great deal of OpenVMS Cluster data. The 
amount of cluster information has increased in each successive version of the 
product and will probably continue to do so. To accommodate future growth in 
this area, beginning with Availability Manager Version 2.2, OpenVMS Cluster 
information is documented in a separate chapter of this manual. 


By clicking a series of “handles” on the cluster node tree in the Cluster Members 
pane of the Cluster Summary page (Figure 4—1), you can open lines of data to 
display progressively more detailed cluster data. This chapter describes the data 
you can display. 


Support for Managed Objects 


New support has been added to the OpenVMS Data Collector, RMDRIVER, for 
OpenVMS managed objects, which are operating system components with 
characteristics that allow the Availability Manager to manage them. Managed 
objects, which register themselves with the Data Collector at system startup, not 
only provide data but also implement fixes in response to client requests. 


In OpenVMS Version 7.3 and later versions, cluster data and fixes are available 
for LAN virtual circuits through the managed object interface. When the 
Availability Manager Data Analyzer connects to a Data Collector node, it 
retrieves a list of the managed objects on that node, if any. For such a node, the 
Availability Manager can provide additional details and any new data that would 
otherwise be unavailable. 


Note 


To enable managed object data collection on nodes running OpenVMS 
Version 7.3 and later, the system manager must take steps so that the 
Data Collector driver, RMDRIVER, is loaded early in the boot process. 
For more details on how to enable collection of managed object data, see 
the Installation Instructions. 


LAN Displays 


When you monitor OpenVMS Version 7.3 and later nodes with managed objects 
enabled, additional cluster data and fixes are available for LAN virtual circuits. 
This data includes enhanced LAN virtual circuit summary data in the Cluster 
Summary window and the LAN Virtual Circuit Details (NISCA) window. In 
addition, the Cluster Summary includes virtual circuit, channel, and device fixes. 
If managed object support is not enabled for a Data Collector node, then only 
basic virtual circuit data is available. 
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4.1 OpenVMS Cluster Summary Page 


4-2 


To display the OpenVMS Cluster Summary page (Figure 4—1), click the Cluster 
Summary tab on an OpenVMS Node Summary page (Figure 1-5). 


The Cluster Summary page contains cluster interconnect information for an 
entire cluster as well as detailed information about each node in the cluster, 
including System Communications Services (SCS) circuits and connections for 
individual nodes. 


The data items shown on this page correspond to data that the Show 

Cluster utility (SHOW CLUSTER) displays for the SYSTEMS, MEMBERS, 
CONNECTIONS, and CIRCUITS classes. No SHOW CLUSTER counterpart 
exists for the PEDRIVER LAN virtual circuit, channel, and device detail displays. 
The data items shown on the page also correspond to data that the SCACP utility 
displays for SHOW commands that display PORT, CIRCUIT, VC, CHANNEL, 
and LAN DEVICE information. 


Figure 4-1 OpenVMS Cluster Summary 
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The two panes in the Cluster Summary page display the following information: 


e The Summary pane (top) displays summary information about the entire 
cluster. 


e The Cluster Members pane (bottom) displays detailed information about each 
node in the cluster, including its System Communication Architecture (SCA) 
connections with other nodes. 
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4.1.1 OpenVMS Cluster Event 


The Availability Manager signals the LOVOTE event when cluster votes minus 
cluster quorum is less than the threshold value for the event. (The default 
threshold for the LOVOTE event is 1.) 


LOVOTE, 


‘node’ VOTES count is close to or below QUORUM 


4.1.2 OpenVMS Cluster Summary Pane 
Table 4—1 describes the data in the OpenVMS Cluster Summary pane 


(Figure 4-1). 


Table 4-1 Summary Pane Data 


Data 


Description 


Formed 
Last Trans 
Votes 


Expected Votes 


Failover Step 
Members In 
Members Out 
Quorum! 

QD Votes 


Failover ID 


Date and time the cluster was formed. 
Date and time of the most recent cluster state transition. 


Total number of quorum votes being contributed by all cluster members 
and by the quorum disk. 


The expected votes contribution by all members of the cluster. This 
value is calculated from the maximum EXPECTED_VOTES system 
parameter and the maximized value of the VOTES system parameter. 


Current failover step index. Shows which step in the sequence of 
failover steps the failover is currently executing. 


Number of cluster members to which the Availability Manager has a 
connection. 


Number of cluster members to which the Availability Manager either 
has no connection or has lost its connection. 


Number of votes that must be present for the cluster to function and to 
permit user activity, that is, to “maintain cluster quorum.” 


Number of votes given to the quorum disk. A value of 65535 means no 
quorum disk exists. 


Failover instance identification. Unique ID of a failover sequence that 
indicates to system managers whether a failover has occurred since the 
last time they checked. 


1You can adjust the quorum value by using the Adjust Quorum fix described in Section 6.2.1. 


4.1.3 OpenVMS Cluster Members Pane 


The Cluster Members pane (the lower pane on the Cluster Summary page 
(Figure 4—1) lists all the nodes in the cluster and provides detailed information 
about each one. Figure 4—2 shows only the Cluster Members pane. 
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Figure 4-2 OpenVMS Cluster Members Pane 
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The first level of information in the Cluster Members pane is cluster member 


data, which is 


Table 4-2 Cluster Member Data 


described in Table 4-2. 


Data Description 

SCS Name System Communications Services (SCS) name for the node (system 
parameter SCSNODE). 

SCSID SCS identification for the node (system parameter SCSYSTEMID). 

CSID Cluster system identification. 

Votes Number of votes the member contributes. 

Expect Member’s expected votes as set by the EXPECTED_VOTES system 
parameter. 

Quorum Number of votes that must be present for the cluster to function and 
permit user activity, that is, to “maintain cluster quorum.” 

LekDirWt Lock manager distributed directory weight as determined by the 
LCKDIRWT system parameter. 

Status Current cluster member status: 


Transition Time 


Status Value 


Description 


NEW 
BRK_NEW 


MEMBER 
BRK_MEM 
NON 
BRK_NON 


REMOVED 
BRK_REM 


New system in cluster. 


New system; there has been a break in the 
connection. 


System is a member of the cluster. 
Member; there has been a break in the connection. 
System is not a member of the cluster. 


Nonmember; there has been a break in the 
connection. 


System has been removed from the cluster. 


System has been removed from the cluster, and 
there has also been a break in the connection. 


The time of the system’s last change in cluster membership status. 
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4.2 Summary Data in the Cluster Members Pane 


The following sections contain descriptions of the categories of summary data 
displayed in the Cluster Members pane (Figure 4—2). 


When you click the handle before an SCS (System Communications Services) 
Name, the Availability Manager first displays a Ports heading, if managed object 
data collection is enabled on this SCS node. 


A port is an OpenVMS device that provide SCA (System Communications 
Architecture) services. Port summary data is discussed in Section 4.2.1. Below 
the Ports heading is the Circuits heading, which precedes a line of SCA headings. 
(SCA data is discussed in Section 4.2.2.) 


4.2.1 Port Summary Data 


When you initially click the handle in front of Ports in the Cluster Members pane 
(Figure 4-1) to a vertical position, Ports headings are displayed, with information 
about port interfaces on the local system, as shown in Figure 4-3. 


Figure 4-3 Port Summary Data 
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Open¥MS Alpha node PRMMC2 - Cluster Summary 
The port summary data shown in Figure 4—3 is described in Table 4-3. Data 


items in this table are related to the SCACP utility SHOW PORTS display and 
the SHOW CLUSTER utility LOCAL_PORT CLASS display. 
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Table 4-3 Local Port Data 


Data Description 
Local Port: 
Name Device name of the port. 
Number The local port’s interconnect address or other interconnect-specific 


Mgmt Priority 
Load Class 


Messages Sent: 
Count 
Rate 


Messages Received: 


Count 
Rate 

Datagrams Sent: 
Count 
Rate 


Datagrams 
Received: 


Count 
Rate 
Kilobytes Mapped 


identifier. 
Management priority assigned to the port. 


Hard-coded capacity value of the port, based on the rate (in 
megabits/second) of the interconnect of the port. 


Total number of messages sent since the port was initialized. 


Rate at which messages are sent (per second). 


Total number of messages sent since the port was initialized. 


Rate at which SCS messages are received (per second). 


Total number of SCS datagrams sent since the port was initialized. 


Rate at which SCS datagrams are sent (per second). 


Total number of SCS datagrams sent since the port was initialized. 
Rate at which SCS datagrams are sent (per second). 
Number of kilobytes mapped for block transfer. 


4.2.2 SCA (System Communications Architecture) Summary Data 


Below the Circuits heading in Figure 4—4 is a line of SCA summary headings 
that include information about a node’s SCS circuits between local SCA ports and 
remote SCA ports on other nodes in the cluster. More than one circuit indicates 
more than one communications path to the other node. 


The data displayed in Figure 44 is similar to the information that the Show 
Cluster utility (SHOW CLUSTER) displays for the CIRCUITS, CONNECTIONS, 
and COUNTERS classes and that the SCACP utilityss SHOW CIRCUITS 
command displays. Note that circuit count is the total number of events since the 
the state of the circuit changed to OPEN. 


Starting with Availability Manager Version 2.2, the circuits display shows circuits 
to non-OpenVMS nodes, such as storage controllers. 
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Figure 4-4 SCA Summary Data 
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Table 4—4 describes the SCA summary data displayed under the Circuits 
heading in Figure 4—4. Each line of data shows either a summary of an SCS 
connection between a local system connection of an application (or SYSAP) to 
a remote SYSAP that uses the circuit, or a summary of interconnect-specific 
information about the operation of the circuit. 


Some of the data described in Table 4—4 is not displayed in Figure 4—4 because 
the screen display is wider than shown. You can scroll to the right on your 
terminal screen to display the remaining fields described in the table. 


Note 


Each rate referred to in Figure 4—4 is in messages per second. The 
“Message Rates” data are rates; the remaining data items are counts. 


Table 4-4 SCA Summary Data 


Data Description 

Remote Node SCS name of the remote node containing the remote port of the 
circuit. 

Local Port The device name of the local port associated with the circuit. 


(continued on next page) 


Displaying OpenVMS Cluster Data 4-7 


Displaying OpenVMS Cluster Data 
4.2 Summary Data in the Cluster Members Pane 


Table 4—4 (Cont.) SCA Summary Data 


Remote Port: 


Type The remote port’s device or interconnect type associated with the 
circuit (for example, LAN, CIPCA, DSSI). 
Number The remote port’s interconnect address, or another other 
interconnect-specific unique identifier. 
State The state of the virtual circuit connection. 
Priority: 
Curr Circuit’s current priority, which is the sum of the management 
priorities assigned to the circuit and associated local port. 
Mgmt Priority value assigned to the circuit by management action. 
Load Class The circuit’s current capacity rating, derived from the current 


Message Rates: 


ECS member’s load class values. 


Sent Count/rate of SCS messages sent over the circuit. 
Received Count/rate that SCS messages are received on the circuit. 
Block Data 
(Kilobytes): 
Mapped Count/rate of kilobytes mapped for block data transfers over the 
circuit. 
Sent Count/rate of kilobytes sent over the circuit using transfers. 
Requested Count/rate of kilobytes requested from the remote port over the 


Block Data (Count): 


circuit using request block data transfers. 


Sent Count/rate of send block data transfers over the circuit. 
Requested Count/rate of block data transfer requests sent over the circuit. 
Datagrams: 
Sent Count/rate of SCS datagrams sent over the circuit. 
Received Count/rate of SCS datagrams received on the circuit. 
Credit Wait Count/rate any connection on the circuit had to wait for a send 
credit. 
Buff Desc Wait Count/rate any connection over the circuit had to wait for a 


buffer descriptor. 


4.2.3 SCS (System Communications Services) Connections Summary Data 


You can click the handle at the beginning of an SCA data row to display the 
following headings when they apply to a particular node: 


e SCS Connections 
e LAN Virtual Circuit Summary 


To display SCS connections summary data, click the handle at the beginning 
of the “SCS Connections” row on the Cluster Summary pane (Figure 4-1). 
Figure 4-5 displays SCS Connections data information. 
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Figure 4—5 SCS Connections Data 
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Table 4—5 describes the SCS connections data shown in Figure 4-5. Some of the 
data described in Table 4—5 is not displayed in Figure 4—5 because the screen 
display is wider than shown. You can scroll to the right on your terminal screen 
to display the remaining fields described in the table. 


Note that connection count is the total number of events since the state of the 
connection changed to OPEN. 


Table 4-5 SCS Connections Data 


Data Description 
SYSAPs: 
Local Name of the SYSAP (system application) on the local system 
associated with the connection. 
Remote Name of the SYSAP on the remote system associated with the 
connection. 


(continued on next page) 
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Table 4—5 (Cont.) SCS Connections Data 


Data 


Description 


State 


Message Rates: 
Sent 


Received 


Block Data 
(Kilobytes): 


Mapped 
Sent 
Requested 


Block Data (Number): 
Sent 


Requested 


Datagrams: 
Sent 
Received 
Credit Wait 
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The connection’s current state. The possible items displayed are: 
e ACCP_SENT—An accept request has been sent. 
e CLOSED—The connection is closed. 


e CON_ACK—A connect request has been sent and 
acknowledged. 


e CON_REC— A connect request has been received. 

e CON_SENT— A connect request has been sent. 

e DISC_ACK—A disconnect is acknowledged. 

e DISC_MTCH—A disconnect request has matched. 

e DISC_REC— A disconnect request has been received. 
e DISC_SENT—A disconnect request has been sent. 

e LISTEN— The connection is in the listen state. 

e OPEN—The connection is open. 

e REJ_SENT— A rejection has been sent. 

e VC_FAI—tThe virtual circuit has failed. 


Count/rate that SCS messages are sent over the connection. 


Count/rate that SCS messages are being received on the 
connection. 


Count/rate of kilobytes mapped for block data transfers by the 
local SYSAP using the connection. Note: This field is available 
only in raw data format. 


Number of kilobytes sent over the SCS connection by the local 
SYSAP using send block data transfers. 


Number of kilobytes requested over the SCS connection by the 
local SYSAP using request block data transfers. 


Count/Rate of send block data transfers by this node over the 
SCS connection. 


Count/Rate of request block data transfers sent to the remote 
port over the SCS connection. 


Count/Rate of datagrams sent on the SCS connection. 
Count/Rate of datagrams received on the SCS connection. 
Count/Rate of times the connection had to wait for a send credit. 
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Table 4—5 (Cont.) SCS Connections Data 


Data Description 
Buff Desc Wait Count/Rate of times the connection had to wait for a buffer 
descriptor. 


4.2.4 LAN Virtual Circuit Summary Data 


You can display interconnect-specific LAN virtual circuit summary data by 
clicking the handle at the beginning of a “LAN Virtual Circuit Summary” row to 
a vertical position. The screen expands to display the interconnect-specific VC 
summary data shown in Figure 4-6. 


Figure 4-6 LAN Virtual Circuit Summary Data 
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Much of the data in this display corresponds to the information displayed by the 
SCACP command SHOW VC. The SHOW CLUSTER command does not provide 
a corresponding display. Which data items are displayed depends on the type of 
interconnect the virtual circuit is using. 
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Currently, this feature is available only for LAN virtual circuits. VC Summary 
displays for other cluster interconnects such as CI might be available in the 
future. When other interconnects are supported, the interconnect type will 

be displayed at the beginning of the line—for example, CI Virtual Circuit 
Summary—and the associated heading will have interconnect-specific data items. 


Note that LAN Virtual Circuit counters are initialized when PEDRIVER detects 
the existence of a PEDRIVER on a remote system. All of a LAN VC’s counters 
are cumulative from that time. 


Some of the data described in Table 4-6 is not displayed in Figure 4-6 because 
the screen display is wider than shown. You can scroll to the right on your 
terminal screen to display the remaining fields described in the table. 


Table 4—6 describes the LAN Virtual Circuit Summary data items shown in 


Figure 4-6. 


Table 4-6 LAN Virtual Circuit Summary Data 


Data 


Description 


VC State 


Total Errors 
ReXmt Ratio 


Channels: 
Open 
ECS 


ECS Priority 


MaxPktSiz 
ReXmt TMO (usec) 


XmtWindow: 
Cur 


Max 


Current internal state of the virtual circuit: 
e OPEN—Virtual Circuit is open and usable. 


e PATH —At least one open channel has been established, but the 
Virtual Circuit has not yet transitioned to OPEN. 


e CLOSED—The Virtual Circuit has been closed or has become 
unusable. 


Number of times the virtual circuit has been closed or has had other 
errors. 


Ratio of total numbers of transmitted to retransmitted packets 
during the most recent data collection interval. 


Number of currently open channels available to the virtual circuit. 


Number of equivalent channel set (ECS) channels currently in use 
by the LAN virtual circuit. 


Priority a channel must have in order to be included in the 
Equivalent channel set (ECS). It is the highest priority any open 
and tight channel has. 


Maximum data buffer size in use by this LAN virtual circuit. 


Retransmission timeout, in microseconds. The length of time the 
virtual circuit is currently using to wait for an acknowledgment of 
the receipt of a packet before retransmitting that packet. 


Current value of the transmit window (or pipe quota). Maximum 
number of packets that are sent before stopping to await an 
acknowledgment. After a timeout, the transmit window is reset to 1 
to decrease congestion; it is allowed to increase as acknowledgments 
are received. 


Maximum transmit window size currently allowed for the virtual 
circuit. 


(continued on next page) 
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Table 4-6 (Cont.) LAN Virtual Circuit Summary Data 


Data Description 


Xmt Options Transmit options enabled: 


CKSM—packet checksumming 
CMPR—compression 


Packets: 
Sent Number of packets sent over this virtual circuit. 
Received Number of packets received over this virtual circuit. 


Most recent: 
Time Opened Most recent time the virtual circuit was opened. 


Time Closed Most recent time the virtual circuit was closed. 


4.2.5 LAN Path (Channel) Summary Data 


A LAN path or channel is a logical communication path between two LAN 
devices. Channels between nodes are determined by a local device, a remote 
device, and the connecting network. For example, two nodes, each having two 
devices, might establish four channels between the nodes. The packets that 
a particular LAN virtual circuit carries can be sent over any open channel 
connecting the two nodes. 


The difference between channels and virtual circuits is that channels provide 
datagram service. Virtual circuits, layered on channels, provide error-free 
paths between nodes. Multiple channels can exist between nodes in an OpenVMS 
Cluster system, but only one LAN-based virtual circuit can exist between any two 
nodes at a time. 


LAN channel counters are initialized when PEDRIVER detects the existence of 
a LAN device on a remote system. All of a LAN channel counters are cumulative 
from that time. For more information about channels and virtual circuits, see the 
HP OpenVMS Cluster Systems manual. 


Displaying Data 

You can display LAN channel summary data by clicking the handle at the 
beginning of a “LAN Virtual Circuit Summary Data” row (Figure 4-6), or by 
right-clicking a data item and choosing the Channel Summary item from the 
shortcut menu. The screen expands to display the LAN channel summary data 
shown in Figure 4-6. If there is no handle at the beginning of a “LAN Virtual 
Circuit Summary” data row, then managed object data collection is not enabled 
for this SCS node. 


The data items displayed depend on the type of virtual circuit. Currently, this 
feature is available only for LAN virtual circuits. 


Some of the data described in Table 4—7 is not displayed in Figure 4-6 because 
the screen display is wider than shown. You can scroll to the right on your 
terminal screen to display the remaining fields described in the table. 


Displaying OpenVMS Cluster Data 4-13 


Displaying OpenVMS Cluster Data 
4.2 Summary Data in the Cluster Members Pane 


Table 4-7 LAN Path (Channel) Data 


Data 


Description 


Devices: 
Local 
Remote 
Channel State 


Total Errors 


ECS State 


Local LAN device associated with the channel. 
Remote LAN device associated with the channel. 


One of the following states: 
e OPEN—Channel is usable. 


e PATH—Channel handshake has been completed and, if usable, will 
transition to OPEN. 


e CLOSED—Channel has been shut down or is unusable. 


Total of various error counters for this channel (see channel details for 
breakdown). 


Channel ECS membership information: 

e Y—Member 

e N—Nonmember 

Losses—one of the following: 

e TT (tight)—Packet loss history is acceptable. 

e L dossy)—Recent history of packet losses makes channel unusable. 
Capacity—one of the following: 


e P (peer)—Priority and Buffer size both match the highest 
corresponding values of the set of tight channels, entitling the 
channel to be an ECS member. 


e I (inferior)—Priority or buffer size does not match the 
corresponding values of the set of tight channels. 


e SS (superior)—Priority or buffer size is better than those of the 
current corresponding values of the set ECS member channels. 
This is a short-lived, transient state because it exists only while the 
ECS membership criteria are being re-evaluated. 


e U (unevaluated)—Priority or buffer size, or both, have not been 
evaluated against the ECS criteria, usually because the channel is 
lossy. 


Speed—one of the following: 


e F (fast)—Channel delay is among the best for tight and peer 
channels. 


e S (slow)—Channel delay makes channel too slow to be usable 
because it would limit the virtual circuit’s average delay. 


Note: If a channel is lossy, its capacity and speed are not always kept 
current. Therefore, displayed values might be those that the channel 
had at the time it become lossy. 


(continued on next page) 


4-14 Displaying OpenVMS Cluster Data 


Displaying OpenVMS Cluster Data 
4.2 Summary Data in the Cluster Members Pane 


Table 4-7 (Cont.) LAN Path (Channel) Data 


Data Description 
Priority: 
Cur Current priority used to evaluate the channel for ECS membership. 
This is the sum of management priority values assigned to the LAN 
device. 
Mgmt Dynamic management-assigned priority. 
Hops Number of switches or bridges in this channel’s network path to the 


remote LAN device. 


BufSiz Current maximum amount of SCS data that can be contained in a 
packet sent over the channel. It is the smallest of the following values: 


e Local LAN device buffer sizes 

e Remote LAN device buffer sizes 

e Local NISCS_MAX_PKTSZ system (SYSGEN) parameter values 

e Remote NISCS_MAX PKTSZ system (SYSGEN) parameter values 


e Largest packet size determined by the NISCA Channel Packet Size 
probing algorithm that the intervening network can deliver 


Delay (usec) Running average of measured round-trip time, in microseconds, for 
packets sent over the channel. 
Load Class Load class initialized from local and remote LAN device bit rates. 
Packets: 
Sent Number of packets sent on this channel, including control packets. 


Received |§ Number of packets received by this channel. 


Most recent: 


Time Last time this channel had a verified usable path to a remote system. 
Opened 

Time Time that this channel was last closed. 

Closed 


4.3 Detailed Data Accessed Through the Cluster Members Pane 


The following sections describe data that appears on lines that you can open in 
the Cluster Members pane (Figure 4—2). 


4.3.1 LAN Device Summary Data 


You can display LAN device summary data by first right-clicking a node name 
on the Cluster Members pane. On Version 7.3 or later nodes on which managed 
objects are enabled, the Availability Manager displays a menu with the following 
choices: 


e SCA Summary 
e LAN Device Summary... 
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Click LAN Device Summary... to display the Device Summary Data page 


(Figure 4—7). 


Figure 4-7 LAN Device Summary Data 
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Right-click LAN Device data item for options and fixes. 


GRCK4 LAN Device Summary 


You can right-click any data item on the page to display a menu with LAN 
Device Fixes... on it. These fixes are explained in Chapter 6. 


Table 4—8 describes the LAN device summary data displayed in Figure 4—7. This 
data is also displayed with SCACP command SHOW LAN_DEVICE. 


Table 4-8 LAN Device Summary Data 


Data Description 

LAN Device Name of the LAN device used for cluster communications between 
local and remote nodes. 
The icon preceding each LAN device can be one of the following 
colors: 
e Black—not enabled (“Not in use by SCA”) 
e =Yellow—“Run” not set 
e Red—“Run” and anything other than Online, Local, or Restart 
e Green—*Run” and a combination of Online, Local, and Restart 

only 

A tooltip indicates the possible states a device can be in. This 
can be a combination of the following: Run, Online, Local, Hello 
_Busy, Build_Hello, Init, Wait_Mgmt, Wait_Evnt, Broken, XChain_ 
Disabled, Delete_pend, Restart, or Restart_Delay. Alternatively, a 
tooltip might display “Not in use by SCA.” 

Type Type of LAN device used for the cluster. 

Errors Number of errors reported by the device since cluster 
communications began using it. 

Management: 

Priority Current management-assigned priority of the device. 
BufSize Current management-assigned maximum buffer size of the device 
BufSize Smaller of interconnect specific buffer size of the device and its 


current management-assigned buffer size. 
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Table 4-8 (Cont.) LAN Device Summary Data 


Data Description 
Messages: 
Sent Number of LAN packets sent by the device. 
Received Number of packets received from remote LAN device. 


4.3.2 LAN Device Detail Data 


To display LAN device detail data, right-click a LAN Path (Channel) Summary 
data item on the LAN Virtual Circuit Summary data page (Figure 4-6). The 
Availability Manager then displays the shortcut menu shown in Figure 4-8. 


Figure 4—8 LAN Path (Channel) Details Menu 
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To display device details, select the LAN Device Details... item on the menu. 
After a brief delay, a LAN Device Overview Data page (Figure 4—9) is displayed. 


A series of tabs at the top of the LAN Device Overview Data page indicate 
additional LAN device pages that you can display. Much of the LAN device 
detail data corresponds to data displayed by the SCACP command SHOW LAN_ 
DEVICE. 


4.3.2.1 LAN Device Overview Data 
The LAN Device Overview Data page (Figure 4-9 displays LAN device summary 
data. 
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Figure 4-9 LAN Device Overview Data 
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Overview 
Status Run, Online, Restart 
Device Name EWA 
Device Type EW DESOO 
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Priority 0 
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LAN Device Details: GRCK4 (EWA) 


Table 4-9 describes the data displayed in Figure 4-9. 


Table 4-9 LAN Device Overview Data 


Data Description 


Status Device status: Run, Online, Local, Hello _Busy, Build_Hello, 
Init, Wait_Mgmt, Wait_Evnt, Broken, XChain_Disabled, Delete_ 
pend, Restart, or Restart_Delay. Alternatively, “Not in use by 
SCA” can be displayed. 


Device Name Name of the LAN device. 

Device Type OpenVMS device type value. 

Total Errors Total number of errors listed on the Errors page. 

Priority Dynamic management-assigned priority. 

Max Buffer Size Maximum data buffer size for this LAN device. 

Memt Buffer Size Dynamic management-assigned maximum block data field size. 

Load Class Load class. The rate in MBs currently being reported by the 
LAN device. 

Receive Ring Size Number of packets the LAN device can buffer before it discards 


incoming packets. 
Default LAN Address LAN device’s hardware LAN address. 
Current LAN Address Current LAN address being used by this LAN device. 


4.3.2.2 LAN Device Transmit Data 


The LAN Device Transmit Data page (Figure 4—10) displays LAN device transmit 
data. 
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Figure 4-10 LAN Device Transmit Data 
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Transmit 
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Table 4-10 describes the data displayed in Figure 4-10. 


Table 4-10 LAN Device Transmit Data 


Data Description 

Messages Sent Number of packets sent by this bus, including multicast “Hello” 
packets. 

Bytes Sent Number of bytes in packets sent by this LAN device, including 


multicast “Hello” packets. 
Multicast Msgs Sent Number of multicast “Hello” packets sent by this LAN device. 


Multicast Bytes Sent Number of multicast bytes in “Hello” packets sent by this LAN 
device. 


Outstanding I/O Count Number of transmit requests being processed by LAN driver. 


4.3.2.3 LAN Device Receive Data 


The LAN Device Receive Data page (Figure 4—11) displays LAN device receive 
data. 
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Receive 
Count Rate 
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LAN Device Details: GRCK4 (EWA) 


Table 4—11 describes the data displayed in Figure 4-11. 


Table 4-11 LAN Device Receive Data 


Data Description 

Messages Revd Number of packets received by this LAN device, including 
multicast packets. 

Bytes Received Number of bytes in packets received by this LAN device, 


including multicast packets. 


Multicast Msgs Revd Number of multicast NISCA packets received by this LAN 
device. 


Multicast Bytes Revd Number of multicast bytes received by this LAN device. 


4.3.2.4 LAN Device Events Data 


The LAN Device Events Data page (Figure 4-12) displays LAN device events 
data. 
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Figure 4-12 LAN Device Events Data 
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Count Rate 
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Table 4-12 describes the data displayed in Figure 4-12. 


Table 4-12 LAN Device Events Data 


Data Description 

Port Usable Number of times the LAN device became usable. 

Port Unusable Number of times the LAN device became unusable. 
Address Change Number of times the LAN device’s LAN address changed. 
Restart Failures Number of times the LAN device failed to restart. 

Last Event Event type of the last LAN device event (for example, LAN 


address change, an error, and so on). 


Time of Last Event Time the last event occurred. 


4.3.2.5 LAN Device Errors Data 


The LAN Device Errors Data page (Figure 4-13) displays LAN device errors 
data. 
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Figure 4-13 LAN Device Errors Data 
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Table 4-13 describes the data displayed in Figure 4-13. 


Table 4-13 LAN Device Errors Data 


Data 


Description 


Bad SCSSYSTEM ID 


Received a packet with the wrong SCSSYSTEM ID in it. 


MC Msgs Directed to TR Number of multicast packets directed to the NISCA Transport 


Layer 


Short CC Messages 
Received 


Short DX Messages 
Received 


CH Allocation Failures 


VC Allocation Failures 


Wrong Port 
Port Disabled 


H/W Transmit Errors 
Hello Transmit Errors 


Last Transmit Error 
Reason 


Time of Last Transmit 
Error 


layer. 


Number of packets received that were too short to contain a 
NISCA channel control header. 


Number of packets received that were too short to contain a 
NISCA DX header. 


Number of times the system failed to allocate memory for use 
as a channel structure in response to a packet received by this 
LAN device. 


Number of times the system failed to allocate memory for use 
as a VC structure in response to a packet received by this LAN 
device. 


Number of packets addressed to the wrong NISCA address. 


Number of packets discarded because the LAN device was 
disabled. 


Number of local hardware transmit errors. 
Number of transmit errors during HELLOs. 


Reason for last transmit error. 


Time of last transmit error: date and time. 
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4.3.3 LAN Path (Channel) Detail Data 


To display LAN path (channel) detail data, right-click a LAN channel summary 
data item on the Cluster Summary page (Figure 4-6). The Availability Manager 
displays a shortcut menu with the options shown in Figure 4-8. 


To display LAN channel details, select the Channel Details... item on the 
menu. After a brief delay, a LAN Channel Overview Data page (Figure 4—14) 
is displayed. A series of tabs at the top of this page indicate additional channel 
pages that you can display. 


4.3.3.1. LAN Channel Overview Data 


The LAN Channel Overview Data page (Figure 4-14) displays general channel 
data, including the state, status, and total errors of the channel. 


Figure 4-14 LAN Channel Overview Data 
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Channel Details: 2BOYS (EWA) to AMDS (ESA) 


Table 4-14 describes the data displayed in Figure 4-14. 


Table 4-14 LAN Channel Overview Data 


Data Description 

State Channel’s current state: OPEN, PATH, or CLOSED. 
Status Channel status. 

Total Errors Sum of channel’s error counters. 

Time Opened Last time that this channel had a path to a remote system. 
Time Closed Last time that this channel was closed. 

Total Time Open Total time that this channel has been open. 

Device Name Local LAN device name. 


(continued on next page) 
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Table 4-14 (Cont.) LAN Channel Overview Data 


Data Description 

Device Type Local LAN device type. 

Average RTT Average of measured round-trip time. 

RSVP Threshold Number of packets before requesting that the remote node 


immediately return an acknowledgment. 
Remote Ring Size Number of entries in the remote LAN device. 


Remote Device Type Remote LAN device type. 


Remote T/R Cache Number of out-of-order packets that the remote transmit/receive 
resequencing cache can buffer. 
LAN H/W Address LAN device’s hardware address. 


4.3.3.2 LAN Channel Counters Data 


The LAN Channel Counters Data page (Figure 4-15) displays path counters data, 
including ECS transitions as well as messages and bytes sent. 


Figure 4-15 LAN Channel Counters Data 
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Table 4-15 describes the data displayed in Figure 4-15. 
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Table 4-15 LAN Channel Counters Data 


Data 


Description 


ECS Transitions 
Messages Sent 

Bytes Sent 

Control Messages Sent 
Control Msg Bytes Sent 


Messages Received 
Bytes Received 


MC Control Messages 
Revd 


MC Control Msg Bytes 
Revd 


Control Messages Revd 
Control Msg Bytes Revd 


Number of times this channel has been in and out of the 
equivalent channel set (ECS). 


Number of packets sent over this channel, including control 
packets. 


Number of bytes transmitted on this channel, including 
control packets. 


Number of control packets sent, not including multicast 
packets. 


Number of control packet bytes sent, not including multicast 
packets. 


Number of packets received by this channel. 
Number of bytes in packets received by this channel. 


Number of multicast control packets received. 
Number of multicast control packets bytes received. 


Number of control packets received. 


Number of control packet bytes received. 


4.3.3.3 LAN Channel Errors Data 


The LAN Channel Errors Data page (Figure 4-16) displays LAN channel errors 


data. 


Figure 4-16 LAN Channel Errors Data 
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Table 4-16 describes the data displayed in Figure 4-16. 


Table 4-16 LAN Channel Errors Data 


Data 


Description 


Seq Retransmit 


LAN Transmit 
Failures 


Restart Channel 
Channel Init Timeouts 
Listen Timeouts 

Bad Authorization Msg 
Bad ECO CC Msg 


Bad Multicast Msg 
CC Short Packet 
CC Incompatible 


Rev Old Channel 
No MSCP Server 


Disk Not Served 
Buffer Size Change 


Number of times a sequenced VC packet sent on this channel 
was retransmitted, and the channel was penalized for the lost 
packet. 


Number of times the local LAN device reported a failure to 
transmit a packet, and channel was penalized for the lost 
packet. 


Close/restart because of channel control packet was received 
indicating the other end closed the channel and is restarting the 
channel handshake. 


Channel initialization handshake timeout. 


No packets of any kind, including HELLOs, were received in 
LISTEN_TIMEOUT seconds. 


Received a CC (channel control) packet with a bad authorization 
field. 


Received a CC packet with an incompatible NISCA protocol ECO 
rev. field value. 


Received a bad multicast CC packet. 
Received a CC packet that was too short. 


Received a CC packet that was incompatible with existing 
channels for this virtual circuit. 


Received a packet from an old instance of a channel. 


No MSCP server available to respond to a received channel 
control solicit service packet asking this node to boot serve 
another node. 


Disk is not served by this system. 


Change in buffer size. 


4.3.3.4 LAN Channel Remote System Data 


The LAN Channel Remote System Data page (Figure 4-17) displays LAN path 


remote system data. 
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Figure 4-17 LAN Channel Remote System Data 
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Table 4-17 describes the data displayed in Figure 4-17. 


Table 4-17 LAN Channel Remote System Data 


Data Description 

Node Name Node name of remote system. 

Buffer Size Buffer size (largest possible buffer size) of remote system. 
Max Buffer Size Current upper bound on buffer size usable on this channel. 
Services NISCA services supported on this channel. 

Dev Name Name of the remote LAN device. 

LAN Address Remote hardware address. 

H/W Type Hardware type of remote node. 

Protocol Version NISCA protocol version of remote system. 


4.3.3.5 LAN Channel ECS (Equivalent Channel Set) Criteria Data 


The LAN Channel ECS Criteria Data page (Figure 4-18) displays equivalent 
channel set criteria data. 
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Figure 4-18 LAN Channel ECS Criteria Data 


{jj Channel Details: 2BOYS (EWA)toAMDS (ESA) 


File Fix 


=/O) x! 


(Overview {Counters [Errors [Remote System /ECS Criteria | 


ECS Criteria 


ECS Membership 
Time Entered ECS 
Time Exited ECS 
Total Time in ECS 
Losses 

Capacity 

Priority 
Management Priority 
Buffer Size 
Management Buffer Size 
Hops 

Management Hops 
Speed 

Average RTT 

Load Class 

Local Seq Number 
Remote Seq Number 


Fast 
7816.1ps 

10 

1 

3 


Channel Details: 2BOYS (EWA) to AMDS (ESA) 


Table 4-18 describes the data displayed in Figure 4-18. 


Table 4-18 LAN Channel ECS Criteria Data 


Data 


Description 


ECS Membership 
Time Entered ECS 
Time Exited ECS 
Total Time in ECS 
Losses 


Capacity 


Priority 


Management Priority 


Buffer Size 


Management Buffer Size 


Hops 


Management Hops 


Speed 
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ECS membership status; that is, Member or Nonmember. 
Last time this channel entered the ECS. 

Last time this channel exited the ECS. 

Total time this channel was in the ECS. 

Value representing channel’s recent packet loss history. 


Channel’s capacity rating based on evaluating its priority, 
buffer size, and hops values relative to the current ECS 
criteria. Values are: Ungraded, Peer, Inferior, Superior. 


Channel’s current priority for ECS calculations; it is the 
sum of the management priorities assigned to the local LAN 
device and to the channel. 


Dynamic management-assigned priority. 


Negotiated maximum common buffer size: the smaller of 
local and remote BUS$ limits on block data field sizes. 


Maximum block data field size assigned by dynamic 
management. 


Number of switches or bridges for this channel. 


Management-supplied hops or media packet storage 
equivalent. 


Classification of channel’s delay relative to that of the lowest 
delay of any ECS member. 
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Table 4-18 (Cont.) LAN Channel ECS Criteria Data 


Data Description 

Average RTT Average measured round-trip time. 

Load Class Lesser of the local and remote LAN device load class values. 
Local Seq Number Sequence number of the local channel. 

Remote Seq Number Sequence number of the remote channel. 


4.3.4 LAN Virtual Circuit Detail Data 


The Network Interconnect for System Communications Architecture (NISCA) is 
the transport protocol responsible for carrying packets such as disk I/Os and lock 
packets across Ethernet and FDDI LANs to other nodes in the cluster. 


The LAN virtual circuit details (NISCA) pages show detailed information about 
the LAN Ethernet or FDDI connection between two nodes. The Availability 
Manager displays one window for each LAN virtual circuit. This page is 
intended primarily to provide real-time aids for diagnosing LAN-related 
cluster communications problems. HP OpenVMS Cluster Systems describes 
the parameters shown on these pages and tells how to diagnose LAN-related 
cluster problems. 


The LAN Virtual Circuit Details pages provide the same information as the 
SCACP command SHOW VC and as the following OpenVMS System Dump 
Analyzer (SDA) commands: PE VC and SHOW PORTS/VC=VC_ remote-node- 
name. In these commands, remote-node-name is the SCS name of another node in 
the cluster. 


SDA defines VC_remote-node-name and performs the first SHOW PORTS action 
after SDA is started. Thus, the /CH and /VC options are valid only with the 
second and subsequent SHOW PORT commands. 


You can display LAN virtual circuit details data by double-clicking a “LAN 
Virtual Circuit Summary” data row or by right-clicking a menu on the Cluster 
Summary page (Figure 4-6). After a brief delay, a LAN VC Transmit Data page 
(Figure 4-19) is displayed. The tabs at the top of the page indicate additional 
pages that you can display. 


The data items displayed depend on the type of virtual circuit. Currently, this 
feature is available only for LAN virtual circuits. 


4.3.4.1. LAN VC Transmit Data 


Transmit data is information about the transmission of data packets, including 
the numbers of packets and bytes sent. Figure 4-19 is an example of a LAN VC 
Transmit Data page. 
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Figure 4-19 LAN VC Transmit Data 
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Table 4-19 describes the data displayed in Figure 4-19. 


Table 4-19 LAN VC Transmit Data 


Data 


Description 


Packets Sent 


Bytes Sent 


Unsequenced (DG) 


Sequenced 


ReXMT Ratio 


Lone ACK 


ReXMT Count 


ReXMT Timeout 
Options 


(Raw) count and rate of packets transmitted through the 
virtual circuit to the remote node, including both sequenced 
and unsequenced (channel control) packets and lone 
acknowledgments. 


(Raw) count and rate of bytes transmitted through the virtual 
circuit. 


(Raw) count and rate of the number of unsequenced packets that 
are transmitted. 


(Raw) count and rate of sequenced packets transmitted. 
Sequenced packets are guaranteed to be delivered. 


Ratio of the total number of sequenced packets sent to the 
current retransmission count. 


(Raw) count and rate of packets sent solely for the purpose of 
acknowledging receipt of one or more packets. 


Number of packets retransmitted. Retransmission occurs 
when the local node does not receive an acknowledgment for 
a transmitted packet within a predetermined timeout interval. 


Number of retransmission timeouts that have occurred. 


Transmit options enabled: 


CKSM—packet checksumming 
CMPR—compression 


4-30 Displaying OpenVMS Cluster Data 


Displaying OpenVMS Cluster Data 
4.3 Detailed Data Accessed Through the Cluster Members Pane 


4.3.4.2 LAN VC Receive Data 


Receive data is information about the receipt of data packets. Figure 4—20 is an 
example of a LAN VC Receive Data page. 


Figure 4-20 LAN VC Receive Data 
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Out of Order 0 0.00 
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Table 4—20 describes the data displayed in Figure 4—20. 


Table 4—20 LAN VC Receive Data 
Data Description 


Packets Received (Raw) count and rate of packets received on the virtual 
circuit from the remote node, including both sequenced 
and unsequenced—that is, datagram packets and lone 


acknowledgments. 

Bytes Received (Raw) count and rate of bytes received in packets over the 
virtual circuit. 

Unsequenced (DG) (Raw) count and rate of unsequenced—datagram—packets 
received. 

Sequenced (Raw) count and rate of sequenced packets received. 

Lone ACK (Raw) count and rate of lone acknowledgments received. 

Duplicate Number of duplicated packets received by this system. 


Duplicates occur when the sending node retransmits a packet, 
and both the original and the retransmitted packets are received. 


Out of Order Number of packets received out of order by this system. 


Illegal ACK Number of illegal acknowledgments received—that is, 
acknowledgments of an out-of-range sequence number. 
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4.3.4.3 LAN VC Congestion Control Data 


LAN VC congestion control data is information about LAN traffic. The values 
indicate the number of packets that can be sent to the remote node before 
receiving an acknowledgment and the retransmission timeout. 


Figure 4-21 is an example of a LAN VC Congestion Control Data page. An item 
that is dimmed indicates that the current version of OpenVMS does not support 
that item. 


Figure 4-21 LAN VC Congestion Control Data 
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Table 4—21 describes the data displayed in Figure 4-21. 


Table 4-21 LAN VC Congestion Control Data 
Data Description 


Transmit Window Current Current value of the transmit window (or pipe quota). After 
a timeout, the pipe quota is reset to 1 to decrease network 
path congestion. The pipe quota is allowed to increase as 
quickly as acknowledgments are received. 


Transmit Window Grow The slow growth threshold. The size at which the increase 
rate of the window is slowed to avoid congestion on the 
network again. 


Transmit Window Max Maximum transmit window size currently allowed for the 
virtual circuit based on channel and remote PEDRIVER 
receive cache limitations. 


Transmit Window Max Management override to calculated value for Maximum 
(mgmt) Transmit Window size. N/A on systems prior to Version 2.0. 
Transmit Window Number of times the entire transmit window was full. If this 
Reached number is small compared with the number of sequenced 


packets transmitted, then either the local node is not sending 
large bursts of data to the remote node, or acknowledging 
packets are being received so promptly that the window limit 
is never reached. 


(continued on next page) 
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Table 4-21 (Cont.) LAN VC Congestion Conirol Data 


Data 


Description 


Roundtrip Time 


Roundtrip Deviation 


Retransmit Timeout 


UnAcked Packets 
CMD Queue Length 
CMD Queue Max 


Average round-trip time, in microseconds, for a packet to be 
sent and acknowledged. 


Average deviation, in microseconds, of the round-trip time. 


Value, in microseconds, used to determine packet 
retransmission timeout. If a packet does not receive either an 
acknowledging or a responding packet, the packet is assumed 
to be lost and will be resent. 


Current number of unacknowledged packets. 
Current length of the virtual circuit’s command queue. 


Maximum number of commands in the virtual circuit’s 


command queue so far. 


4.3.4.4 LAN VC Channel Selection Data (Nonmanaged Objects) 


The display of information about LAN VC channel selection depends on the 
version of OpenVMS and whether managed objects have been enabled. (For more 
information about managed objects, see the introduction to this chapter.) 


Figure 4—22 is an example of a Nonmanaged Object LAN VC Channel Selection 
Data page. 


Figure 4—22 LAN VC Channel Selection Data (Nonmanaged Objects) 


[,j) DBGAVC Virtual Circuit to DRINKS q -|5] x} 


File Fix | 


Channel Selection {VC Closures Packets Discarded | 
Transmit [ Receive [| Congestion Control | 


Channel Selection 


Buffer Size 1412 
Channel Count 1 
Channel Selections 5 
Protocol 1.4.0 
Local Device EW_DE435 
Local LAN Address Aa-00-04-00-98-4¢ 
Remote Device EZ_SGEC 


Remote LAN Address Aa-00-04-00-AE-4¢ 


DBGAYC Virtual Circuit to DRINKS 


Displaying OpenVMS Cluster Data 4-33 


Displaying OpenVMS Cluster Data 
4.3 Detailed Data Accessed Through the Cluster Members Pane 


Table 4—22 describes the data displayed in Figure 4-22. 


Table 4-22 LAN VC Channel Selection Data (Nonmanaged Objects) 


Data 


Description 


Buffer Size 
Channel Count 
Channel Selections 
Protocol 


Local Device 


Local LAN Address 


Remote Device 


Remote LAN Address 


Maximum data buffer size for this virtual circuit. 

Number of channels available for use by this virtual circuit. 
Number of channel selections performed. 

NISCA protocol version. 


Name of the local LAN device that the channel uses to send and 
receive packets. 


Address of the local LAN device that performs sends and 
receives. 


Name of the remote LAN device that the channel uses to send 
and receive packets. 


Address of the remote LAN device performing the sends and 
receives. 


4.3.4.5 LAN VC Channel Selection Data (Managed Objects Enabled) 


Systems running Availability Manager with managed objects enabled collect and 
display the following information about LAN VC Channel Selection Data. (For 
more information about managed objects, see the introduction to this chapter.) 


Note 


An additional requirement for displaying some of the data on this data 
page is that managed objects be enabled on your system. For more 
information, see the HP Availability Manager Installation Instructions. 


Figure 4—23 is an example of a LAN VC Channel Selection Data page with 
managed objects enabled. 
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Figure 4—23 LAN VC Channel Selection Data (Managed Objects Enabled) 
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Table 4—23 describes the data displayed in Figure 4-23. 


Table 4-23 Channel Selection Data (Managed Objects Enabled) 


Data 


Description 


ECS Priority 
Buffer Size 
Hops 

Channel Count 


Channel Selections 
Protocol 


Speed Demote 
Threshold 


Speed Promote 
Threshold 


Min RTT 
Min RTT Threshold 


Mgmt Demote 
Threshold 


Current minimum priority a tight channel must have in order to 
be an ECS member. 


Maximum data buffer size for this virtual circuit. A channel 
must have this buffer size in order to be an ECS member. 


Current minimum management hops a channel must have in 
order to be included in the ECS. 


Number of channels currently available for use by this virtual 
circuit. 


Number of channel selections performed. 
Remote node’s NISCA protocol version. 
Current threshold for reclassifying a FAST channel to SLOW. 


Current threshold for reclassifying a SLOW channel to FAST. 


Current minimum average delay of any current ECS members. 


Current threshold for reclassifying a channel as FASTER than 
the current set of ECS channels. 


A management-specified lower limit on the maximum delay (in 
microseconds) an ECS member channel can have. Whenever 

at least one tight peer channel has a delay of less than the 
management-supplied value, all tight peer channels with delays 
less than the management-supplied value are automatically 
included in the ECS. When all tight peer channels have delays 
equal to or greater than the management setting, the ECS 
membership delay thresholds are automatically calculated and 
used. 
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4.3.4.6 LAN VC Closures Data 


LAN VC closures data is information about the number of times a virtual circuit 
has closed for a particular reason. Figure 4—24 is an example of a LAN VC 
Closures Data page. 


An entry that is dimmed indicates that the current version of OpenVMS does not 
support that item. 


Figure 4—24 LAN VC Closures Data 


AFFS1 Virtual Circuit to AFFS14 -|5) x} 


File Fix Help 


{| ¥C Closures 


VC Closures 


Count 
No Path 
SeqMsg TMO 
Topology Change 
CC DFQ Empty 
NPAGEDYN Low 
LAN Xmt TMO 


ooooocr 


AFFS1 Virtual Circuit to AFFS14 


Table 4—24 describes the data displayed in Figure 4-24. 


Table 4-24 LAN VC Closures Data 


Data Description 

No Path Number of times the VC was closed because no usable LAN path 
was available. 

SeqMsg TMO Number of times the VC was closed because a sequenced packet’s 
retransmit timeout count limit was exceeded. 

Topology Change Number of times the VC was closed because PEDRIVER 


performed a failover from a LAN path (or paths) with a large 
packet size to a LAN path with a smaller packet size. 


CC DFQ Empty Number of times the VC was closed because the channel control 
data-free queue (DFQ) was empty. 

NPAGEDYN Low Number of times the VC was closed because of a nonpaged pool 
allocation failure in the local node. 

LAN Xmt TMO Number of times the VC was closed because the LAN device 


used to send the packet did not report transmit completion 
before the packet’s transmit timeout limit was exceeded. 
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4.3.4.7 LAN VC Packets Discarded Data 


LAN VC packets discarded data is information about the number of times packets 
were discarded for a particular reason. Figure 4—25 is an example of a LAN VC 
Packets Discarded Data page. 


Figure 4—25 LAN VC Packets Discarded Data 


COWBOX ¥irtual Circuit to COWBOX q = {oy x} 


File Fix 


Channel Selection I VC Closures | Packets Discarded 


Transmit [ Receive | Congestion Control | 


Packets Discarded 


9° 
2 
= 


Bad Checksum 
No Xmt Chan 

Rev Short Msg 

Ill Seq Msg 

TR DFQ Empty 

TR MFQ Empty 
CC MFQ Empty 
Rev Window Miss 


Oro Ora oO oc 


COWBOX Virtual Circuit to COWBOX 


Table 4—25 describes the data displayed in Figure 4-25. 


Table 4-25 LAN VC Packets Discarded Data 


Data Description 

Bad Checksum Number of times there was a checksum failure on a received 
packet. 

No Xmt Chan Number of times no transmit channel was available. 

Rev Short Msg Number of times an undersized transport packet was received. 

Ill Seq Msg Number of times an out-of-range sequence numbered packet was 
received. 

TR DFQ Empty Number of times the transmit data-free queue (DFQ) was empty. 

TR MFQ Empty Number of times the TR layer message-free queue (MFQ) was 
empty. 

CC MFQ Empty Number of times the channel control MFQ was empty. 

Rev Window Miss Number of packets that could not be placed in the virtual 


circuit’s receive cache because the cache was full. 
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Note 


Before you start this chapter, be sure to read the explanations of data 
collection, events, thresholds, and occurrences in Chapter 1. 


The Availability Manager indicates resource availability problems in the Event 
pane (Figure 5-1) of the main System Overview window (Figure 1-1). 


Figure 5-1 OpenVMS Event Pane 


Node | 


'S HRDWR3 
‘) WILD6 
‘S XENON2 
‘S XENON4 
{S GRCK2 
‘S WILDS 
/S WILD3 
') XENON1 
XENON1 
‘S SQPE2 
/S XENON3 
‘S WILD4 
‘/S WILD6 
/S WILDS 
‘) XENON2 
TARDIS 
AFFS10 
‘S COWBOX 
‘S DENALI 
DENALI 


Group _—| Date & Time [Beverity| Event Description 
6 60 


KJF SwLANci O6-Jan-2Z004 16:59:29.72 


HIDIOR HRDWR3 direct I/O rate is high 


KIF SwhANci O6-Jan-2004 16:59:32.270 60 HIDIOR WILD6 direct I/O rate is high 
KIF SwhANci O6-Jan-2004 16:59:32.320 60 HIDIOR XENONZ direct I/O rate is high 
KIF SwLANci 06-Jan-2004 16:59:35.94 60 HIDIOR XENON4 direct I/O rate is high 
KOF SwLANci 06-Jan-2004 16:59:35.604 60 HIDIOR GRCK2Z direct I/O rate is high 
KIF SwhANci O6-Jan-2004 16:59:42.674 60 HIDIOR WILDS direct I/O rate is high 
KIF SwhANci O6-Jan-2004 16:59:44.948 60 HIDIOR WILD3 direct I/O rate is high 
KOF SwLANci 06-Jan-2004 16:59:46.850 60 HIDIOR XENON1 direct I/O rate is high 
KIF SwLANci 06-Jan-2004 17:26:59.88 60 HINTER XENON] interrupt mode time is high 


DECAMDS 


06-Jan-2004 17:28:28.136 60 HINTER SQPE2Z interrupt mode time is high 


KJF SwLANci O6-Jan-2004 17:29:24.767 60 HINTER XENON3 interrupt mode time is high 

KIF SwhANci 06-Jan-2004 16:59:03.999 60 HMPSYN WILD4 MP synchronization mode time is high 
KOF SwLANci O6-Jan-2004 16:59:22.245 60 #HMPSYN WILD6 MP synchronization mode time is high 
KIF SwhANci O6-Jan-2Z004 16:59:32.670 60 HMPSYN WILDS MP synchronization mode time is high 
KJF SwLANci O6-Jan-2004 17:29:25.308 60 HMPSYN XENONZ MP synchronization mode time is high 


TARDIS 
KOINEZ 
DECAMDS 


06-Jan-2004 17:03:00.559 60 LOVLSP TARDIS TARDIS$DKC100(IOHAMMERED) disk volume free space 
06-Jan-2004 17:25:37.781 60 LOVLSP KOINEZ AFFS1O$DKAO(BLIZ) disk volume free space is low 
06-Jan-2004 17:25:42.397 60 LOVLSP DECAMDS $1$DGASOO (WORKSTATIONS) disk volume free space 


High Peaks 06-Jan-2004 17:26:27.913 60 LOVLSP High Peaks $6$DRAZ00($6$DRA200) disk volume free space 


High Peaks 06-Jan-2004 17:26:27.913 60 LOVLSP High Peaks $6$DRB100($6$DRB100) disk volume free space 


Collection [High Peaks] has 2 nodes 


The Event pane helps you identify system problems. In many cases, you can 
apply fixes to correct these problems as well, as explained in Chapter 6. 


The Availability Manager displays a warning message in the Event pane 
whenever it detects a resource availability problem. If logging is enabled (the 
default), the Availability Manager also logs each event in the Events Log file, 
which you can display or print. (For the location of this file and a cautionary note 
about it, see Section 5.2.) 


5.1 Event Information Displayed in the Event Pane 


The Availability Manager can display events for all nodes that are currently in 
communication with the Data Analyzer. When an event of a certain severity 
occurs, the Availability Manager adds the event to a list in the Event pane. 
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5.1 Event Information Displayed in the Event Pane 


The length of time an event is displayed depends on the severity of the event. 
Less severe events are displayed for a short period of time (30 seconds); more 
severe events are displayed until you explicitly remove the event from the Event 
pane (explained in Event Pane Menu Options). 


Data in the Event Pane 


Table 5-1 provides additional information about the data items that are displayed 
in the Event pane. 


Table 5-1 Event Pane Data 


Data Item Description 

Node Name of the node causing the event 

Group Group of the node causing the event 

Date Date the event occurred 

Time Time that an event was detected 

Sev Severity: a value from 0 to 100. (You can customize this value 


to indicate the importance of the event, with 100 as the most 
important.) 


Event Alphanumeric identifier of the type of event 


Description Short description of the resource availability problem 


Appendix B contains tables of events that are displayed in the Event pane. In 
addition, these tables contain an explanation of each event and the recommended 
remedial action. 


Event Pane Menu Options 


When you right-click a node name or data item in the Event pane, the Availability 
Manager displays a shortcut menu with the following options: 


Menu Option Description 

Display Displays the Node Summary page associated with that event. 
Remove Removes an event from the display. 

Freeze/Unfreeze Freezes a value in the display until you “unfreeze” it; a snowflake 


icon is displayed to the left of an event that is frozen. 


Customize Allows you to customize events. 


5.2 Criteria for Evaluating an Event 


During data collection, any time data meets or exceeds the threshold for an 
event, an occurrence counter is incremented. When the incremented value 
matches the value in the Occurrence box on the Event Customization page 
(Figure 5-2), the event is posted in the Event pane of the System Overview 
window (Figure 1-1). 
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Figure 5-2 Sample Event Customization 


Customization - Open¥MS Default Settings x} 


{fi@] OSKERR, high disk device error count y| Use default values | 


Event Customizations 


Severity 60 [2] Occurrence 7 A 
Threshold 15 2] Error(s) 
Threshold 


Escalation actions: [| User [vi OPCOM [_| HP OpenView 


User Action | 


Event explanation and investigation hints 


The error count for the disk device exceeds the threshold. 


Check error log entries for device errors. A disk device with a high 
error count could indicate a problem with the disk or with the 
connection between the disk and the system. 


© Global OpenVMS | ok | cancel | appty | Hein 


The sample Event Customization page indicates a threshold of 15 errors and 
an occurrence value of 2. This means that if the DSKERR event exceeds its 


threshold of 15 for two consecutive data collections, the DSKERR event is posted 
in the Event pane. 


Note that some events are triggered when data is lower than the threshold; other 
events are triggered when data is higher than the threshold. 


If, at any time during data collection, the data does not meet or exceed the 
threshold, the occurrence counter is set to zero, and the event is removed from 
the Event pane. Figure 5-3 depicts this sequence. 
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Figure 5-3 Testing for Events 


Does 
any data 


need testing 
? 


Does 
data 
exceed a 
threshold 


Set occurrence 
counter to 0 


Increment 
occurrence counter 


Does 
occurrence 
counter equal 
occurrence 
value 


Remove event 
from Event pane 


Post event 


VM-0480A-Al 


5.3 Criteria for Posting and Displaying an Event 
When an event is posted, the following actions occur: 
e The event is displayed in the Event pane. 


e The data associated with the event is collected at the Event interval shown 
on the Data Collection Customization page (Figure 5-4). In this example, the 
event is associated with the Disk Status data collection. 
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Figure 5-4 OpenVMS Data Collection Customization 


x 
Data Collection —EeEeE= 
Data Collection & Update Intervals [vu se default values _| 

Collect 
SH Clustersummary | 10.0 20.0 60.0 
al CPU mode 5.0 5.0 5.0 
i CPU process | 5.0 10.0 30.0 
WM © Disk status 15.0 15.0 60.0 
oi Disk volume 16.0 15.0 60.0 

JB WO Lig | ee 
5 Lockcontention | 10.0 20.0 60.0 
a Memory 5.0 10.0 30.0 
© fii] Node summary 5.0 50 =| 50 
oi Page/Swapfile 30.0 30.0 60.0 
rz Single disk | _—-5.0 NIA NA 
ra Single process 5.0 NIA NiA 


Icons are used to indicate the current customization level in effect. 

Indicates the current settings are from the Availability Manager 
built-in set. 

©. Indicates the current settings are from the Application level 


\4| 


© Global OpenVMS | ok || cancel | Appy | Hein 


On the Data Collection Customization page, for example, the Event interval 
for Disk Status data collection is every 15 seconds. 


Figure 5-5 OpenVMS Group/Node Pane 


i Node Name | CPU | Active CPUs| MEM) BIO| DIO|CPU Qs/Events} Proc Ct OS Version Hv Model 


O.- 0. 6OUlU8 0 0 00 0% Open VMS V7.3-2 AlphaServer DS10 466 MH 
O.- 0 6OUlU8 0 0 00 0% Open VMS V7.3-2 AlphaServer DS10 466 MH 
O/.- 0. 6O0UlU*8 0 0 00 0% Open VMS V7.3-2 AlphaPC 264DP 500 MHz 
1/1 70 0 O 0 0 35/64 55% Open VMS V7.1 VAXstation 4000-VLC 

1/1 45 0 O 0 QO 27/64 42% Open VMS V7.1 VAXstation 4000-VLC 

1/1 63 oO O 0 O 34/64 53% Open VMS V7.1 VAXstation 4000-90 

1/1 19 10 0 0 1 19/480 4% Open VMS V7.3 VAX 6000-610 

2/2 72 27 11 0 3 40/387 10% Open VMS V8.2 COMPAQ AlphaServer DS2! 
2i2 7 15 5 tt] 3 43/473 9% Open VMS V8.2 COMPAQ AlphaServer DS2! 
2i2 41 0 O 0 O 34/366 9% Open VMS XAXU.... AlphaServer 1200 5/533 4! 
2/2 4 0 0 0 1 29/930 3% OpenVMS V7.3-2 AlphaServer ES45 Model 1 
2/2 3 0 0 tt] 1 = 25/320 7% OpenVMS V8.2-1 HP rx2600 

2/2 19 0 9O 0 1 23/270 =8% OpenVMS V8.2-1 HPrx1620 

4/4 21 0 33 


0 1 30/320 9% OpenVMS V8.2-1 HP rx2620 
LALIT L TIS S LTTE D TANS LI DO SOP SLATES SATIS SOL TASTED TATA DISTILLED OD TOTTI 


When an event is posted, the following actions also occur: 


e The Events field in the Group/Node pane is incremented, and the node icon 
in the Node Name field turns red (see Figure 5-5). You can see the events 
posted for this node in a tooltip by placing the mouse over the Node Name. 


e When an event is posted, it is added to the Events Log file by default: 
— On OpenVMS systems, the Events Log file is: 


AMDSSAM LOG: ANALYZEREVENTS .LOG 


A new version of this file is created each time you access the Availability 
Manager. 
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— On Windows systems, the Events Log file is: 


Analyzer! 


Events. log 


This file, which is in the installation directory, is overwritten each time 
you access the Availability Manager. 


The following example shows a partial event log file: 


VAXJET 01-22-2004 11:24:50.67 0 
DBGAVC 01-22-2004 11:25:12.41 0 
AFFS5 01-22-2004 11:25:13.23 0 
DBGAVC 01-22-2004 11:25:18.31 8 
VAXJET 01-22-2004 11:25:27.47 4 


PEROIT 01-22-2004 11:25:27.16 0 
KOINE 01-22-2004 11:25:33.05 99 
MAWK 01-22-2004 11:26:20.15 99 
MAWK 01-22-2004 11:26:24.48 60 
REDSQL 01-22-2004 11:26:30.61 10 
REDSQL 01-22-2004 11:26:31.18 60 
MAWK 01-22-2004 11:26:24.48 60 
AFFS52 01-22-2004 11:25:33.64 60 
VAXJET 01-22-2004 11:38:46.23 90 
REDSQL 01-22-2004 11:39:18.73 60 
REDSQL 01-22-2004 11:44:37.19 75 


CFGDON 
CFGDON 
CFGDON 
Li 
Li 


CKCNT 
OBIOQ 


CFGDON 
NOSWFL 
FXTIMO 
HIDIOR 
PRPGFL 
PRPIOR 
HIDIOR 
DSKMNV 
DPGERR 
PRC PWT 
PRCCUR 


VAXJET configuration done 

DBGAVC configuration done 

AFFS5 configuration done 

DBGAVC possible contention for resource REGSMASTER_LOCK 
VAXJET LESSACP_V30 has used most of 

its BIOLM process quota 

ROIT configuration done 

INE has no swap file 

K Fix timeout for FID to Filename Fix 

WK direct I/O rate is high 

DSQL _FTA2: high page fault rate 

DSQL _FTA7: paging 1/0 rate is high 

K direct I/O rate is high 

FS52 $4S$DUA320(OMTV4) disk mount verify in progress 
AXJET error executing driver program, 

DSQL _FTA2: waiting in PWAIT 

DSQL _FTA7: has a high CPU rate 


wt 
Oo 


zal 


Das P 


ee 


Caution About Events Logs 


If you collect data on many nodes, running the Availability Manager for 
a long period of time can result in a large events log. For example, in a 
run that monitors more than 50 nodes with most of the background data 
collection enabled, the events log can grow by up to 30 MB per day. At 
this rate, systems with small disks might fill up the disk on which the 
events log resides. 


Closing the Availability Manager application will enable you to access the 
events log for tasks such as archiving. Starting the Availability Manager 
starts a new events log. 


5.4 Displaying Additional Event Information 


For more detailed information about a specific event, double-click any event data 
item in the Event pane. The Availability Manager first displays a data page that 
most closely corresponds to the cause of the event. You can choose other tabs for 
additional detailed information. 


For a description of data pages and the information they contain, see Chapter 3. 
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Performing Fixes on OpenVMS Nodes 


Fixes allow you to resolve resource availability problems and improve system 
availability. 


This chapter discusses the following topics: 
e Understanding fixes 


e Performing fixes 


Caution 


Performing certain fixes can have serious repercussions, including 
possible system failure. Therefore, only experienced system managers 
should perform fixes. 


6.1 Understanding Fixes 


When you suspect or detect a resource availability problem, in many cases you 
can use the Availability Manager to analyze the problem and to perform a fix to 
improve the situation. 


Availability Manager fixes fall into these categories: 
e Node fixes 

e Process fixes 

e Cluster interconnect fixes 


You can access fixes, by category, from the pages listed in Table 6-1. 


Table 6-1 Accessing Availability Manager Fixes 


Fix Category and Name Available from This Page 
Node fixes: Node Summary 
CPU 
Crash Node Memory Summary 
Adjust Quorum VO Process 
SCA Port 
SCA Circuit 


LAN Virtual Circuit 
LAN Path (Channel) 
LAN Device 


(continued on next page) 
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Table 6-1 (Cont.) Accessing Availability Manager Fixes 


Fix Category and Name 


Available from This Page 


Process fixes: 
General process fixes: 


Delete Process 
Exit Image 
Suspend Process 
Resume Process 
Process Priority 


Process memory fixes: 


Purge Working Set (WS) 
Adjust Working Set (WS) 


Process limits fixes: 


Direct I/O 
Buffered I/O 
AST 

Open file 

Lock 

Timer 
Subprocess 
I/O Byte 
Pagefile Quota 


Cluster interconnect fixes: 


- SCA Port:/ Adjust Priority 


- SCA Circuit:/ Adjust Priority 


LAN Virtual Circuit Summary: 


Maximum Transmit Window Size 
Maximum Receive Window Size 
Checksumming 

Compression 

ECS Maximum Delay 


LAN Path (Channel) Summary: 


Adjust Priority 
Hops 


6-2 Performing Fixes on OpenVMS Nodes 


All of the process fixes are available from the 
following pages: 


Memory Summary 
I/O Process 

CPU Process 
Single Process 


These fixes are available from the following 
lines of data on the Cluster Summary page 
(Figure 4-1): 


Right-click a data item on the Local Port Data 


display line to display a menu. Then select 
Port Fix.... 


Right-click a data item on the Circuits Data 
display line to display a menu. Then select 
Circuit Fix.... 


Right-click a data item on the LAN Virtual 
Circuit Summary line to display a menu. 
Then select VC LAN Fix.... Alternatively, you 
can use the Fix menu on the LAN VC Details 


page. 


Right-click a data item on the LAN Path 
(Channel) Summary line to display a menu. 
Then select Fixes.... Alternatively, you can 
use the Fix menu on the Channel Details 
page. 

(continued on next page) 
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6.1 Understanding Fixes 


Table 6-1 (Cont.) Accessing Availability Manager Fixes 


Fix Category and Name 


Available from This Page 


LAN Device Details: 


Adjust Priority 

Set Maximum Buffer Size 
Start LAN Device 

Stop LAN Device 


You can access these fixes in the following 
ways: 


e ~=Right-click an item in the LAN Path 
(Channel) Summary category to display 
a menu. Then select LAN Device 
Details... to display pages containing 
Fix options. 


e Right-click an item in the LAN Device 
Summary page and then select LAN 
Device Fixes..... 


e =6©Select Fixes... on the LAN Device Details 
page. 


Table 6-2 summarizes various problems, recommended fixes, and the expected 


results of fixes. 


Table 6-2 Summary of Problems and Matching Fixes 


Problem 


Fix 


Result 


Node resource hanging cluster 


Cluster hung 
Process looping, intruder 


Endless process loop in same PC 
range 


Runaway process, unwelcome 
intruder 


Process previously suspended 


Runaway process or process that 
is overconsuming 


Low node memory 
Working set too high or low 


Process quota has reached its 
limit and has entered RWAIT 
state 


Process has exhausted its 
pagefile quota 


Crash Node 


Adjust Quorum 
Delete Process 


Exit Image 
Suspend Process 


Resume Process 


Process Priority 


Purge Working Set 
(WS) 


Adjust Working Set 
(WS) 


Adjust Process 
Limits 


Adjust Pagefile 
Quota 


Node fails with operator-requested shutdown. 
See Section 6.2.2 for the crash dump footprint 
for this type of shutdown. 


Quorum for cluster is adjusted. 
Process no longer exists. 


Exits from current image. 
Process is suspended from execution. 


Process starts from point it was suspended. 


Base priority changes to selected setting. 


Frees memory on node; page faulting might 
occur for process affected. 


Removes unused pages from working set; page 
faulting might occur. 


Process limit is increased, which in many cases 
frees the process to continue execution. 


Pagefile quota limit of the process is adjusted. 


Most process fixes correspond to an OpenVMS system service call, as shown in 
the following table: 
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Process Fix System Service Call 
Delete Process $DELPRC 

Exit Image $FORCEX 

Suspend Process $SUSPND 

Resume Process $RESUME 

Process Priority $SETPRI 

Purge Working Set (WS) $PURGWS 

Adjust Working Set (WS) $ADJWSL 

Adjust process limits of the following: None 


Direct I/O (DIO) 

Buffered I/O (BIO) 
Asynchronous system trap (AST) 
Open file (FIL) 

Lock queue (ENQ) 

Timer queue entry (TQE) 
Subprocess (PRC) 

I/O byte (BYT) 


Note 


Each fix that uses a system service call requires that the process execute 
the system service. A hung process has the fix queued to it, and the fix 
does not execute until the process is operational again. 


Be aware of the following facts before you perform a fix: 


You must have write access to perform a fix. To perform LAN fixes, you must 
have control access. 


You cannot undo many fixes. For example, after using the Crash Node fix, the 
node must be rebooted (either by the node if the node reboots automatically, 
or by a person performing a manual boot). 


Do not apply the Exit Image, Delete Process, or Suspend Process fix to system 
processes. Doing so might require you to reboot the node. 


Whenever you exit an image, you cannot return to that image. 
You cannot delete processes that have exceeded their job or process quota. 


The Availability Manager ignores fixes applied to the SWAPPER process. 


How to Perform Fixes 


Standard OpenVMS privileges restrict users’ write access. When you run the 
Data Analyzer, you must have the CMKRNL privilege to send a write (fix) 
instruction to a node with a problem. 


The following options are displayed at the bottom of all fix pages: 
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Option Description 

OK Applies the fix and then exits the page. Any message associated with 
the fix is displayed in the Event pane. 

Cancel Cancels the fix. 

Apply Applies the fix and does not exit the page. Any message associated 


with the fix is displayed in the Return Status section of the page and 
in the Event pane. 


The following sections explain how to perform node fixes and process fixes. 


6.2 Performing Node Fixes 


One node fix has the special ability among Availability Manager fixes to allow you 
to deliberately fail (or crash) a node. Another node fix allows you to adjust cluster 
quorum. 


To perform a node fix, follow these steps: 
1. On the Node Summary, CPU, Memory, or I/O page, select the Fix menu. 
2. Select Fix Options. 
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6.2.1 Adjust Quorum 


The default node fix displayed is the Adjust Quorum fix, which forces a node to 

recalculate the quorum value. This fix is the equivalent of the Interrupt Priority 
level C (IPC) mechanism used at system consoles for the same purpose. The fix 

forces the adjustment for the entire cluster so that each node in the cluster has 

the same new quorum value. 


The Adjust Quorum fix is useful when the number of votes in a cluster falls below 
the quorum set for that cluster. This fix allows you to readjust the quorum so 
that it corresponds to the current number of votes in the cluster. 


The Adjust Quorum page is shown in Figure 6-1. 


Figure 6-1 Adjust Quorum 


Availability Manager Fixes P4| 
(Node 


Node Name: FXEF80 


Fix Type Explanation 
Adjusts cluster quorum 
Adiust Quorum This fix will cause the cluster to recalculate 
the cluster quorum. This fix will allow a 


cluster that is hung because it has lost 
quorum to regain quorum and resume 
operation. 


Return Status 


| OK | Cancel | Apply | Help 
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6.2.2 Crash Node 
Caution 


The Crash Node fix is an operator-requested bugcheck from the driver. 

It takes place as soon as you click OK in the Crash Node fix. After you 
perform this fix, the node cannot be restored to its previous state. After a 
crash, the node must be rebooted. 


When you select the Crash Node option, the Availability Manager displays the 
Crash Node page, shown in Figure 6-2. 


Figure 6-2 Crash Node 


Availability Manager Fixes xi} 


{ Node | 


Node Name: QTV18 


Fix Type Explanation 
Crashes the node 
Crash Node MA This fix will attempt to crash the node. A 
successful return status means that the 


connection to the node has been severed. 


CAUTION 
Use as a last resort only! 


Return Status 


| OK | Cancel | Apply | Help 


Note 


Because the node cannot report a confirmation when a Crash Node fix 
is successful, the crash success message is displayed after the timeout 
period for the fix confirmation has expired. 


Recognizing a System Failure Forced by the Availability Manager 

Because a user with suitable privileges can force a node to fail from the Data 
Analyzer by using the Crash Node fix, system managers have requested a method 
for recognizing these particular failure footprints so that they can distinguish 
them from other failures. These failures all have identical footprints: they are 
operator-induced system failures in kernel mode at IPL 8. The top of the kernel 
stack is similar the following display: 
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SP => Quadword system address 
Quadword data 
1BEQDEAD. 00000000 
00000000.00000000 
Quadword data TRAPSCRASH 
Quadword data SYSSRMDRIVER + offset 


6.3 Performing Process Fixes 
To perform a process fix, follow these steps: 
1. On the Memory or I/O page, right-click a process name. 


2. Click Fix Options. 
The Availability Manager displays these Process tabs: 


Process General 
Process Memory 
Process Limits 


Click one of these tabs to bring it to the front. 


4. Click the down arrow to display the process fixes in this group, as shown in 
Figure 6-3, where the Process General tab has been chosen. 


Figure 6-3 Process General Options 


Availability Manager Fixes xi 


Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 
Changes the base priority of the process 
|_ Process Priority 


Fix Value 
4| 


Return Status 


| OK | Cancel | Apply | Help 


5. Select a process fix (for example, Process Priority, shown in Figure 6-3), to 
display a fix page. 


Some of the fixes, such as Process Priority, require you to use a slider to change 
the default value. When you finish setting a new process priority, click Apply at 
the bottom of the page to apply that fix. 
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6.3.1 General Process Fixes 


The following sections describe Availability Manager general process fixes. These 
fixes include instructions telling how to delete, suspend, and resume a process. 


6.3.1.1 Delete Process 


In most cases, a Delete Process fix deletes a process. However, if a process is 
waiting for disk I/O or is in a resource wait state (RWAST), this fix might not 
delete the process. In this situation, it is useless to repeat the fix. Instead, 
depending on the resource the process is waiting for, a Process Limit fix might 
free the process. As a last resort, reboot the node to delete the process. 


Caution 


Deleting a system process can cause the system to hang or become 
unstable. 


When you select the Delete Process option, the Availability Manager displays the 
page shown in Figure 6-4. 


Figure 6-4 Delete Process 


Availability Manager Fixes xi 
(NeHE TT] Process General _|(Proses= Memory | Process Limits || 


Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 
Deletes the selected process 
Delete Process ¥_ This fix will cause the process to be deleted. 
However, ifthe process is hung because it 


has exhausted a process resource limit, this 
fix may not be able to delete the process. 
First adjust the process resource limit with a 
Process Limit fix. 


Return Status 


| OK | Cancel | Apply | Help | 


After reading the explanation, click Apply at the bottom of the page to apply the 
fix. A message displayed on the page indicates that the fix has been successful. 
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6.3.1.2 Exit Image 


Exiting an image on a node can stop an application that a user requires. Make 
sure you check the Single Process page before you exit an image to determine 
which image is running on the node. 


Caution 


Exiting an image on a system process could cause the system to hang or 
become unstable. 


When you select the Exit Image option, the Availability Manager displays the 
page shown in Figure 6-5. 


Figure 6-5 Exit Image Page 


Availability Manager Fixes xi 
(CHEER Process General GESEEEMERTCRN CECE 


Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 
Forces the image of the process to exit 
_ Exit Image bal 


Return Status 


| ok || cancel | appy | Hein 


After reading the explanation in the page, click Apply at the bottom of the page 
to apply the fix. A message displayed on the page indicates that the fix has been 
successful. 
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6.3.1.3 Suspend Process 
Suspending a process that is consuming excess CPU time can improve perceived 
CPU performance on the node by freeing the CPU for other processes to use. 
(Conversely, resuming a process that was using excess CPU time while running 
might reduce perceived CPU performance on the node.) 


Caution 


Do not suspend system processes, especially JOB_CONTROL, because 
this might make your system unusable. (For more information, see HP 
OpenVMS Programming Concepts Manual, Volume I.) 


When you select the Suspend Process option, the Availability Manager displays 
the page shown in Figure 6-6. 


Figure 6-6 Suspend Process 


Availability Manager Fixes xi 
(Node | Process General 


Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 
Suspends the process 
Suspend Proc... | This fixis equivalent to $ SET 
PROCESS/SUSPEND. 


Return Status 


| OK | Cancel | Apply | Help 


After reading the explanation, click Apply at the bottom of the page to apply the 
fix. A message displayed on the page indicates that the fix has been successful. 
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6.3.1.4 Resume Process 


Resuming a process that was using excess CPU time while running might reduce 
perceived CPU performance on the node. (Conversely, suspending a process 
that is consuming excess CPU time can improve perceived CPU performance by 
freeing the CPU for other processes to use.) 


When you select the Resume Process option, the Availability Manager displays 
the page shown in Figure 6-7. 


Figure 6-7 Resume Process 


x! 
(Nowe ] Process General | (Process Memory | Process Limits) | 
Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 


= = Resumes the process 
Resume Proc... ¥ This fix is equivalent to $ SET 
PROCESS/RESUME. 


Return Status 


[ok || cancet | appy | Hein | 


After reading the explanation, click Apply at the bottom of the page to apply the 
fix. A message displayed on the page indicates that the fix has been successful. 
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6.3.1.5 Process Priority 


If the priority of a compute-bound process is too high, the process can consume 
all the CPU cycles on the node, affecting performance dramatically. On the other 
hand, if the priority of a process is too low, the process might not obtain enough 
CPU cycles to do its job, also affecting performance. 


When you select the Process Priority option, the Availability Manager displays 
the page shown in Figure 6-8. 


Figure 6-8 Process Priority 


Availability Manager Fixes P| 


Process General 


Node Name: MONSON Process: DNS$ADVER (0000004C) 


Fix Type Explanation 


Changes the base priority of the process 


_Process Priority 


Fix Value 


Return Status 


| OK | Cancel | Apply | Help | 


To change the base priority for a process, drag the slider on the scale to the 
number you want. The current priority number is displayed in a small box above 
the slider. You can also click the line above or below the slider to adjust the 
number by 1. 


When you are satisfied with the new base priority, click Apply at the bottom of 
the page to apply the fix. A message displayed on the page indicates that the fix 
has been successful. 
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6.3.2 Process Memory Fixes 


The following sections describe the Availability Manager fixes you can use to 
correct process memory problems— Purge Working Set and Adjust Working Set 
fixes. 


6.3.2.1 Purge Working Set 


This fix purges the working set to a minimal size. You can use this fix to reclaim 
a process’s pages that are not in active use. If the process is in a wait state, the 
working set remains at a minimal size, and the purged pages become available 
for other uses. If the process becomes active, pages the process needs are page- 
faulted back into memory, and the unneeded pages are available for other uses. 


Be careful not to repeat this fix too often: a process that continually reclaims 
needed pages can cause excessive page faulting, which can affect system 
performance. 


When you select the Purge Working Set option, the Availability Manager displays 
the page shown in Figure 6-9. 


Figure 6-9 Purge Working Set 


Availability Manager Fixes xi 
 — 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 
Purges the working set of the process 
Purge WS ms This fix reduces the working set of the 
process to a minimal size. 


Return Status 


| OK | Cancel | Apply | Help | 


After reading the explanation on the page, click Apply at the bottom of the page 
to apply the fix. A message displayed on the page indicates that the fix has been 
successful. 
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6.3.2.2 Adjust Working Set 
Adjusting the working set of a process might prove to be useful in a variety of 
situations. Two of these situations are described in the following list. 


e Ifa process is page-faulting because of insufficient memory, you can reclaim 
unused memory from other processes by decreasing the working set of one or 
more of them. 


e Ifa process is page-faulting too frequently because its working set is too 
small, you can increase its working set. 


Caution 


If the automatic working set adjustment is enabled for the system, a 
fix to adjust the working set size disables the automatic adjustment for 
the process. For more information, see OpenVMS online help for SET 
WORKING_SET/ADJUST, which includes /NOADJUST. 


When you select the Adjust Working Set fix, the Availability Manager displays 
the page shown in Figure 6-10. 


Figure 6-10 Adjust Working Set 


Availability Manager Fixes q x 
((Now)) | Prosess General | Process wemoy | Prooess Limits] 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 


Adjusts the working set size of a process 
_AdjustWS There are two caveats for this fix 
This fix disables the automatic working set 
adjustment for the process. 
The adjusted working set value cannot 
exceed 
WSQUOTA for the process or WWSMAX for the 
system. 
Memory is represented in 512 byte units. 


Fix Value 


7904 


Return Status 


| OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the working set to the limit you want. 
You can also click the line above or below the slider to adjust the number by 1. 


When you are satisfied with the new working set limit, click Apply at the bottom 
of the page to apply the fix. A message displayed on the page indicates that the 
fix has been successful. 
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6.3.3 Process Limits Fixes 


If a process is waiting for a resource, you can use a Process Limits fix to increase 
the resource limit so that the process can continue. The increased limit is in 
effect only for the life of the process, however; any new process is assigned the 
quota that was set in the UAF. 


When you click the Process Limits tab, you can select any of the following options: 


Direct I/O 
Buffered I/O 
AST 

Open File 
Lock 

Timer 
Subprocess 
I/O Byte 
Pagefile Quota 


These fix options are described in the following sections. 


6.3.3.1 Direct I/O Count Limit 


You can use this fix to adjust the direct I/O count limit of a process. When you 
select the Direct I/O option, the Availability Manager displays the page shown in 
Figure 6-11. 


Figure 6-11 Direct I/O Count Limit 


Availability Manager Fixes xi} 
EE Fecess or 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Direct /O count limit of the 
Direct 0 7 process 


Fix Value 
100] 


Return Status 


| OK | Cancel | Apply | Help | 


To perform this fix, use the slider to adjust the direct I/O count to the limit you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new direct I/O count limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.3.3.2 Buffered I/O Count Limit 


You can use this fix to adjust the buffered I/O count limit of a process. When you 
select the Buffered I/O option, the Availability Manager displays the page shown 
in Figure 6-12. 


Figure 6-12 Buffered I/O Count Limit 


Availability Manager Fixes Xx} 
Process Limits | 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Buffered I/O count limit of the 
Buffered 0 y, process 


Fix Value 


100 


Return Status 


| OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the buffered I/O count to the limit you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new buffered I/O count limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.3.3.3 AST Queue Limit 


You can use this fix to adjust the AST queue limit of a process. When you select 
the AST option, the Availability Manager displays a page similar to the one 
shown in Figure 6-13. 


Figure 6-13 AST Queue Limit 


Availability Manager Fixes x} 
Process Limits | 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 
Adjusts the AST Queue limit of the process 
AST aA 


Fix Value 
100 


Return Status 


[ok || cance | appy | Hein | 


To perform this fix, use the slider to adjust the AST queue limit to the number 
you want. You can also click the line above or below the slider to adjust the 
number by 1. 


When you are satisfied with the new AST queue limit, click Apply at the bottom 
of the page to apply the fix. A message displayed on the page indicates that the 
fix has been successful. 
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6.3.3.4 Open File Limit 


You can use this fix to adjust the open file limit of a process. When you select 
the Open File option, the Availability Manager displays a page similar to the one 
shown in Figure 6-14. 


Figure 6-14 Open File Limit 


Availability Manager Fixes Xx} 
Process Limits | 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Open File limit of the process 
Open File wv 


Fix Value 
Return Status 


200 
400 
E> | 200 
| OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the open file limit to the number you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new open file limit, click Apply at the bottom of 
the page to apply the fix. A message displayed on the page indicates that the fix 
has been successful. 
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6.3.3.5 Lock Queue Limit 


You can use this fix to adjust the lock queue limit of a process. When you select 
the Lock option, the Availability Manager displays a page similar to the one 
shown in Figure 6-15. 


Figure 6-15 Lock Queue Limit 


Availability Manager Fixes Xx} 
Process Limits | 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Lock Queue limit of the process 
Lock m4 


Fix Value 
512 


Return Status 


[ok || cancel | appy | Hein | 


To perform this fix, use the slider to adjust the lock queue limit to the number you 
want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new lock queue limit, click Apply at the bottom 
of the page to apply the fix. A message displayed on the page indicates that the 
fix has been successful. 
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6.3.3.6 Timer Queue Entry Limit 


You can use this fix to adjust the timer queue entry limit of a process. When 
you select the Timer option, the Availability Manager displays the page shown in 
Figure 6-16. 


Figure 6-16 Timer Queue Entry Limit 


Availability Manager Fixes Xx} 
Process Limits | 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Timer Queue entry limit of the 
_Timer v process 


Fix Value 
8 

16 

ED | 8 

To perform this fix, use the slider to adjust the timer queue entry limit to the 


number you want. You can also click the line above or below the slider to adjust 
the number by 1. 


Return Status 


| OK | Cancel | Apply | Help 


When you are satisfied with the new timer queue entry limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.3.3.7 Subprocess Creation Limit 


You can use this fix to adjust the creation limit of the subprocess of a process. 
When you select the Subprocess option, the Availability Manager displays the 
page shown in Figure 6-17. 


Figure 6-17 Subprocess Creation Limit 


Availability Manager Fixes x} 
Process Limits | 


Node Name: ALMOST Process: DECWSSERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Subprocess Creation limit of the 
Subprocess y, process 


Fix Value 


Return Status 


| OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the subprocess creation limit of a 
process to the number you want. You can also click the line above or below the 
slider to adjust the number by 1. 


When you are satisfied with the new subprocess creation limit, click Apply at the 
bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.3.3.8 VO Byte 


You can use this fix to adjust the I/O byte limit of a process. When you select 


the I/O Byte option on the movable bar, the Availability Manager displays a page 
similar to the one shown in Figure 6-18. 


Figure 6-18 I/O Byte 


Availability Manager Fixes .4| 
Process Limits | 


Node Name: ALMOST Process: DECVWSSERVER_O (00000066) 


Fix Type Explanation 
Adjusts the Buffered I/O Byte limit of the 
VO Byte process 


The value entered will be rounded up to the 
Fix Value next 64 byte boundary. 


60000 


100000 


Return Status 


OK | Cancel | Apply | Help 


To perform this fix, use the slider to adjust the I/O byte limit to the number you 


want. You can also click the line above or below the slider to adjust the number 
by 1. 


When you are satisfied with the new I/O byte limit, click Apply at the bottom of 


the page to apply the fix. A message displayed on the page indicates that the fix 
has been successful. 


Performing Fixes on OpenVMS Nodes 6-23 


Performing Fixes on OpenVMS Nodes 
6.3 Performing Process Fixes 


6.3.3.9 Pagefile Quota 


You can use this fix to adjust the pagefile quota limit of a process. This quota 
is share among all the processes in a job. When you select the Pagefile Quota 
option, the Availability Manager displays the page shown in Figure 6-19. 


Figure 6-19 Pagefile Quota 


Availability Manager Fixes P.4| 
Process Limits | 


Node Name: ANDA1A Process: DNS$ADVER (27800418) 


Fix Type Explanation 
= Adjusts the Pagefile quota limit of the 
_Pagefile Quota ¥ | process 


Fix Value 


3125) 


Return Status 


[ok |[ cancer |[ apply || Help | 


To perform this fix, use the slider to adjust the pagefile quota limit to the number 
you want. You can also click above or below the slider to adjust the fix value by 1. 


When you are satisfied with the new pagefile quota limit, click Apply at the 


bottom of the page to apply the fix. A message displayed on the page indicates 
that the fix has been successful. 
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6.4 Performing Cluster Interconnect Fixes 
Note 


All cluster interconnect fixes require that managed objects be enabled. 


The following are categories of cluster interconnect fixes: 
e Port adjust priority fix 

e Circuit adjust priority fix 

e =6LAN virtual circuit (VC) summary fixes 

e LAN channel (path) fixes 

e LAN device fixes 


The following sections describe these types of fixes. The descriptions also indicate 
whether or not the fix is currently available. 
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6.4.1 Port Adjust Priority Fix 


To access the Port Adjust Priority fix, right-click a data item in the Local Port 
Data display line (see Figure 4—3). The Availability Manager displays a shortcut 
menu with the Port Fix option. 


This page (Figure 6-20) allows you to change the cost associated with this port, 
which, in turn, affects the routing of cluster traffic. 


Figure 6-20 Port Adjust Priority 


x 
(RGB (SCAP OR 


Port: EBJB27 PNAO 


Fix Type Explanation 
Adjust the management priority for the Port 
Adiust Priority This fix changes the cost associated with 
this Port which in turn affects the routing of 


cluster traffic. 


Fix Value 
0 


Return Status 


| ok | cancel | Apply [Help 
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6.4.2 Circuit Adjust Priority Fix 


To access the Circuit Adjust Priority fix, right-click a data item in the circuits 
data display line (see Figure 4—4). The Availability Manager displays a shortcut 
menu with the Circuit Fix option. 


This page (Figure 6-21) allows you to change the cost associated with this circuit, 
which, in turn, affects the routing of cluster traffic. 


Figure 6—21 Circuit Adjust Priority 


x 
(Node ] Scacircuit | 


Circuit: EBJB2? PEAQ: to BONU2 


Fix Type Explanation 
Adjust the management priority for the Circuit 
Adjust Priority 7 This fix changes the cost associated with 
this Circuit which in turn affects the routing of 


Fix Value cluster traffic. 


o 


Return Status 


| ok | cancel | Apply | 
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6.4.3 LAN Virtual Circuit Fixes 


To access LAN virtual circuit fixes, right-click a data item in the LAN Virtual 
Circuit Summary category (see Figure 4-6), or use the Fix menu on the LAN 
Device Details... page. 


The Availability Manager displays a shortcut menu with the following options: 
e Channel Summary 

e VC LAN Details... 

e VC LAN Fix... 


When you select VC LAN Fix..., the Availability Manager displays the first of 
several fix pages. Use the Fix Type box to select one of the following LAN VC 
fixes: 


e Maximum Transmit Window Size 

e Maximum Receive Window Size 

e Checksumming 

e Compression 

e ECS Maximum Delay 

These fixes are described in the following sections. 


6.4.3.1 LAN VC Checksumming Fix 


The LAN VC Checksumming fix (Figure 6-22) allows you to turn checksumming 
on or off for the virtual circuit. 


Figure 6-22 LAN VC Checksumming 


x 
(Node ] Virtual Circuit | 


Virtual Circuit: CMOVEQ PEAQ: to BRICKS 


Fix Type Explanation 
Turn Checksumming on or off for the Virtual 
Checksumming ¥ Circuit 
This Virtual Circuit fix may not be available 


Fix Value on all target systems. 


ON| 


Return Status 


| ok | cancet | apply | vein 
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6.4.3.2 LAN VC Maximum Transmit Window Size Fix 


The LAN VC Transmit Window Size fix (Figure 6—23) allows you to adjust the 
maximum transmit window size for the virtual circuit. 


Figure 6—23 LAN VC Maximum Transmit Window Size 


Availability Manager Fixes q xi 
“Node | Virtual Circuit 


Virtual Circuit: CMOVEQ PEAQ: to BRICKS 


Fix Type Explanation 
Adjust the Maximum Transmit Window Size 
MaxxXmtWin.. oY for the Virtual Circuit 
This Virtual Circuit fix may not be available 


on all target systems. 


Fix Value 


8 
255 
ED |g 


Return Status 


| ok | cancer | Apply | 
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6.4.3.3 LAN VC Maximum Receive Window Size Fix 


The LAN VC Maximum Receive Window Size fix (Figure 6-24) allows you to 
adjust the maximum receive window size for the virtual circuit. 


Figure 6—24 LAN VC Maximum Receive Window Size 


Availability Manager Fixes ; xi 
“Node | Virtual Circuit 


Virtual Circuit: CMOVEQ PEAQ: to BRICKS 


Fix Type Explanation 
Adjust the Maximum Receive Window Size 
MaxRevWinS.. oY for the Virtual Circuit 
This Virtual Circuit fix may not be available 


on all target systems. 


Fix Value 


o 
255 
ED |g 


Return Status 


| ok | cancer | Apply | 
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6.4.3.4 LAN VC Compression Fix 


The LAN VC Compression fix (Figure 6-25) allows you to turn compression on or 
off for the virtual circuit. This fix, however, might not be available on all target 
systems. 


Figure 6-25 LAN VC Compression 


Availability Manager Fixes P4| 
| Node | Virtual Circuit 


Virtual Circuit: CMOVEQ PEAQO: to BRICKS 


Fix Type Explanation 
Turn Compression on or off for the Virtual 
Compression ¥ Circuit 
This Virtual Circuit fix may not be available 


on all target systems. 


Fix Value 


Return Status 


[ok | cancel | apply | Hein 
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6.4.3.5 LAN VC ECS Maximum Delay Fix 


The LAN VC ECS Maximum Delay fix (Figure 6-26) sets a management-specific 
limit on the maximum delay (in microseconds) an ECS member channel can have. 
You can set a value between 0 and 3000000. Zero disables a prior management 
delay setting. 


You can use this fix to override PEdriver automatically calculated delay 
thresholds. This ensures that all channels with delays less than the value 
supplied are included in the VC’s ECS. 


Figure 6-26 LAN VC ECS Maximum Delay 


Availability Manager Fixes xi 
(Node y virtual circuit | 
BRICKS Virtual Circuit to CMOVEQ 


Fix Type Explanation 


Sets a management specified lower 
Demote Thres... ¥ bound on the maximum delay (in 
microseconds) an ECS member channel 
can have. Set a value between 0 and 
3000000. Zero disables a prior 
O) management delay setting. 
You can use this command to override 
the PEdriver automatically calculated 
delay thresholds to ensure that all 
channels with delays less than the value 
supplied are included in the VC's ECS. 


Fix Value 


3000000 


Return Status 


| OK | Cancel | Apply | Help 


On the sample page shown in Figure 6-26, you cannot read the following text 
(which is displayed when you move the slider down): “The fix operates as 
follows: Whenever at least none tight peer channel has a delay of less than 

the management-supplied value, all tight peer channels with delays less than 
the management-supplied value are automatically included in the ECS. When all 
tight peer channels have delays equal to or greater than the management setting, 
the ECS membership delay thresholds are automatically calculated and used. 


You must determine an appropriate value for your configuration by 
experimentation. An initial value of 2000 (2ms) to 5000 (5ms) is suggested.” 


On this page, the following note of caution is also displayed: 


Caution 


By overriding the automatic delay calculations, you can include a channel 
in the ECS whose average delay is consistently greater than 1.5 to 2 times 
the average delay of the fastest channels. When this occurs, the overall 
VC throughput becomes the speed of the slowest ECS member channel. 
An extreme example is when the management delay permits a 10Mb/sec 
Ethernet channel to be included with multiple 1Gb/sec channels. The 
resultant VC throughput drops to 10Mb/sec. 
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6.4.4 LAN Channel Fixes 


To access LAN path fixes, right-click an item on a LAN Path (Channel) Summary 
line (see Figure 4-6). The Availability Manager displays a shortcut menu with 
the following options: 


e Channel Details... 
e LAN Device Details... 
e =6Fixes... 


Click Fixes... or use the Fix menu on the Channel Details page. The Availability 
Manager displays a page with the following Fix Types: 


e Adjust Priority 
e Hops 
e Max Packet Size 


These fixes are described in the following sections. 
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6.4.4.1 LAN Path (Channel) Adjust Priority Fix 


The LAN Path (Channel) Adjust Priority fix (Figure 6-27) allows you to change 
the cost associated with this channel by adjusting its priority. This, in turn, 
affects the routing of cluster traffic. 


Figure 6—27 LAN Path (Channel) Adjust Priority 


Availability Manager Fixes xi 
LAN Channel 


Channel Details: CMOVEQ (EVA) to BRICKS (EVA) 


Fix Type Explanation 
Sets the management priority value for the 


Adiust Priority Re channel. The priority can be a value between 


-128 and +127. 
Suggested values are: 


2 to cause channels to be preferred 
-2 to exclude channels 


CAUTION 
Ifyou set the priority of all channels to -128, 
you will totally disable use of the LAN for 
cluster communication. This can cause the 
system to CLUEXIT. 


Fix Value 


oo 


NOTE 
Starting with OpenVMS Version 7.3-2,a 
channel whose priority is -128 is not used 


for cluster communications. The priority of a 
channel is the sum of the management 
priority assigned to the local LAN device 
and the channel itself. Therefore, you can 
assign any combination of channel and 
LAN device management priority values to 
achieve a total of-128. 


Return Status 


| ok | cancer | Apply | 
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6.4.4.2 LAN Path (Channel) Hops Fix 


LAN Path (Channel) Hops fix (Figure 6—28) allows you to change the hops for the 
channel. This change, in turn, affects the routing of cluster traffic. 


Figure 6-28 LAN Path (Channel) Hops 


Availability Manager Fixes P4| 
| Node | LAN Channel | 


Channel Details: CMOVEQ (EVVA) to BRICKS (EVA) 


Fix Type Explanation 
Adjust the hops for the Channel 
Hops a This fix changes the hops value associated 
with this Channel which in turn affects the 


Fix Value routing of cluster traffic. 


85 
255 
2 


Return Status 


OK | Cancel | Apply | 
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6.4.5 LAN Device Fixes 


To access LAN device fixes, right-click an item in the LAN Path (Channel) 
Summary category (see Figure 4-6). The Availability Manager displays a 
shortcut menu with the following options: 


e Channel Details... 
e LAN Device Details... 
e 6Fixes... 


Select LAN Device Details to display the LAN Device Details window. From 
the Device Details window, select Fix... from the Fix menu. (These fixes are also 
accessible from the LAN Device Summary page.) 


The Availability Manager displays the first of several pages, each of which 
contains a fix option: 

Adjust Priority 

Set Max Buffer Size 

Start LAN Device 

Stop LAN Device 


These fixes are described in the following sections. 
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6.4.5.1. LAN Device Adjust Priority Fix 


The LAN Device Adjust Priority fix (Figure 6-29) allows you to adjust the 
management priority for the device. This fix changes the cost associated with 
this device, which, in turn, affects the routing of cluster traffic. 


Starting with OpenVMS Version 7.3-2, a channel whose priority is -128 is 
not used for cluster communications. The priority of a channel is the sum of 
the management priority assigned to the local LAN device and the channel 
itself. Therefore, you can assign any combination of channel and LAN device 


management priority values to arrive at a total of -128. 


Figure 6-29 LAN Device Adjust Priority 
x 


(Node | LAN Device | 


LAN Device Details: CMOVEQ (EVA) 


Fix Type Explanation 
Sets the management priority value for the 
| Adjust Priority | LAN device. The priority can be a value 


Fix Value 


between -128 and +127. 
Suggested values are: 


2 to cause devices to be preferred 
-2 to exclude devices 


CAUTION 
Ifyou set the priority of all devices to -128, 
you will totally disable use of the LAN for 
cluster communication. This can cause the 
system to CLUEXIT. 


NOTE 
Starting with OpenVMS Version 7.3-2, a 
channel whose priority is -128 is not used 
for cluster communications. The priority of a 
channel is the sum of the management 
priority assigned to the local LAN device 
and the channel itself. Therefore, you can 
assign any combination of channel and 
LAN device management priority values ta 
achieve a total of -128. 


Return Status 


| OK | Cancel | Apply | Help 
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6.4.5.2 LAN Device Set Maximum Buffer Fix 


The LAN Device Set Maximum Buffer fix (Figure 6-30) allows you to set the 
maximum packet size for the device, which changes the maximum packet size 
associated with this channel. This change, in turn, affects the routing of cluster 
traffic. 


Figure 6-30 LAN Device Set Maximum Buffer Size 


Availability Manager Fixes P| 
(Node ] LAN Device | 


LAN Device Details: CMOVEQ (EWA) 


Fix Type Explanation 
Set the maximum packet size for the LAN 
Set Max Buffer.. ¥ Device 
This fix changes the maximum packet size 


Fievalue associated with this Channel which in turn 
A016 affects the routing of cluster traffic. 


Return Status 


| ok | cancel | Apply [Het 
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6.4.5.3 LAN Device Start Fix 


The LAN Device Start fix (Figure 6-31) starts the use of this particular LAN 


device. This fix allows you, at the same time, to enable this device for cluster 
traffic. 


Figure 6-31 LAN Device Start 


x 
(Node) LAN Device | 
LAN Device Details: CMOVEQ (EWA) 


Fix Type Explanation 


Starts use of this LAN Device 


Start LAN Devi... ¥ This fix Enables the use of this LAN Device 
for cluster traffic. 


Return Status 


| ok | cancel | Apply | Help 
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6.4.5.4 LAN Device Stop Fix 


The LAN Device Stop fix (Figure 6-32) stops the use of this particular LAN 
device. At the same time, this fix disables this device for cluster traffic. 


Caution 


This fix could result in interruption of cluster communications for this 
node. The node might exit the cluster (CLUEXIT crash). 


Figure 6-32 LAN Device Stop 


(Node) LAN Device | 
LAN Device Details: CMOVEQ (EWA) 


Fix Type Explanation 
Stop use of this LAN Device 
Stop LAN Devi... ¥ This fix Disables the use of this LAN Device 


for cluster traffic. 


CAUTION 
This fix could result in interruption of cluster 
communications for this node to the extent 
that it exits the cluster (ie. CLUEXIT crash) 


Return Status 


| ok | cancel | Apply [He | 
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Customizing the Availability Manager 


This chapter explains how to customize the following Availability Manager 


features: 


Feature 


Description 


Nodes or node 
groups 
Data collection 


Data filters 


Event escalation 


Event filters 


Security 


Watch process 


You can select one or more groups or individual nodes to monitor. 


For OpenVMS nodes, you can choose the types of data you want 
to collect as well as set several types of collection intervals. (On 
Windows nodes, specific types of data are collected by default.) 


For OpenVMS nodes, you can specify a number of parameters and 
values that limit the amount of data that is collected. 


You can customize the way events are displayed in the Event pane 
of the System Overview window (Figure 2-1), and you can configure 
events to be signaled to OPCOM and OpenView. 


You can specify the severity of events that are displayed as well as 
several other filter settings for events. 


On Data Analyzer and Data Collector nodes, you can change 
passwords. On OpenVMS Data Collector nodes, you can edit a 
file that contains security triplets. 


You can specify up to eight processes for the Availability Manager to 
monitor and report on if they exit and also if they subsequently are 
created. 


In addition, you can change the group membership of nodes, as explained in 
Section 7.4.1 and Section 7.4.2. 


Table 7-1 shows the levels of customization the Availability Manager provides. 
At each level, you can customize specific features. The table shows the features 
that can be customized at each level. 
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Table 7-1 Levels of Customization 


Operating 

Customizable Features Application System Group Node 
Nodes or node groups x 

Data collection x x Xx 
Data filters x x X 
Event escalation x Xx Xx Xx 
Event filters Xx Xx X 
Security x xX x 
Watch process x X X 


7.1 Understanding Levels of Customization 


You can customize each feature at one or more of the following levels, as shown 
in Table 7-1: 


Application 
Operating System 
Group 

Node 


In addition to the four levels of customization are Availability Manager Defaults 
(AM Defaults), which are top-level, built-in values that are preset (hardcoded) 
within the Availability Manager. Users cannot change these settings themselves. 
If no customizations are made at any of the four levels, the AM Default values 
are used. 


The following list describes the four levels of customization. 


Application values override AM Defaults for nodes and groups of nodes as 
well as event escalation (unless overriding customization are made at the 
operating system, group, or node levels). 


Operating system values override Application values for event escalation. 
Operating System values override AM Defaults for the remaining features 
shown in Table 7-1. 


Group values override Operating System and Application values as well as 
AM Defaults. 


Node values override Group, Operating System, and Application values, as 
well as AM Defaults. 


Any of these four levels of customization overrides AM Defaults. Also, 
customizing values at any successive level overrides the value set at the previous 
level. For example, customizing values for Data filters at the Group level 
overrides values for Data filters set at the Operating System level. Similarly, 
customizing values for Data filters at the Node level overrides values for Data 
filters set at the Group level. 
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The customization levels for various Availability Manager values are displayed 
as icons on some pages. The OpenVMS Data Collection Customization page 
(Figure 7-1) displays several of these icons. 


Figure 7-1 OpenVMS Data Collection Customization 


Customization - Settings for Open¥MS node AFFS? 


ee Collection Filer | Security | 


Use default values 


Data Collection & Update Intervals 


Data 
$2 Cluster summary 


xi 


iSI/is] 


$2) CPU mode 5.0 

Vv 3] CPU process 5.0 
Wi © Disk status 15.0 
J iii Disk volume 15.0 
fo flvo 10.0 
i Lock contention 10.0 
[vi = & Memory 6.0 
6 Node summary 6.0. 
a Page/Swap file 30.0 
~) fill] Single disk 6.0 
v Single process 5.0 


=| Node 


AFFS7 


Icons are used to indicate the current customization level in effect. 
Indicates the current settings are from the Availability Manager 
built-in set. 

© Indicates the current settings are from the Application level 


lal 


| OK | Cancel | 


ply | Help 


The icons preceding each data item in Figure 7—1 indicate the current 
customization level for each collection choice. Table 7—2 describes these icons 
and tells where each appears in Figure 7-1. 


Table 7-2 Customization Icons in Figure 7-1 


Icon 


Location 


Meaning 


Graph 


Magnifyi 
glass 


Swoosh 


Double monitors 


Single m 


Before “Disk volume” 
ng Bottom left of window 


Before “Disk status” 


Before “Cluster 


summary” 


onitor Before “Memory” 


Current setting is from the built-in AM 
Defaults. 


Current setting is from the Application 
level. 


Current setting has been modified at the 
OpenVMS Operating System Level. 


Current setting has been modified at the 
group level. 


Current setting has been modified at the 
node level. 
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7.1.2 Setting Levels of Customization 


When you customize values, the Availability Manager keeps track of the next 
higher level of each value. This means that you can reset a value to the value set 
at the next higher level. 


To return to the values set at the preceding level, click the Use default 
values button at the top of a customization page. The icon on the “Use default 
values” button and explanation at the bottom of the page indicate the previous 
customization level. 


In the main System Overview window (see Figure 2-1), you can select the 
customization levels that are shown in Table 7-1. The following sections explain 
levels of customization in more detail. 


7.1.3. Knowing the Number of Nodes Affected by Each Customization Level 


Another way of looking at Availability Manager customization is to consider the 
number of nodes affected by each level of customization. Depending on which 
customization menu you use and your choice of menu items, your customizations 
can affect one or more nodes, as indicated in the following table. 


Nodes Affected Action 

All nodes Select Customize Application... on the menu shown in 
Figure 7-2. 

All Windows nodes Select Operating Systems -> Customize Windows NT... on 


the menu shown in Figure 7-2. 


All OpenVMS nodes Select Operating Systems -> Customize OpenVMS... on 
the menu shown in Figure 7-2. 


Nodes in a group Select Customize... on the shortcut menu shown in 
Figure 7-7. The customization options you choose affect only 
the group of nodes that you select. 


One node Select Customize... on the shortcut menu shown in 
Figure 7-8 or on the Customize shortcut menu on the Node 
page. The customization options you choose affect only the 
node that you select. 


7.2 Customizing Settings at the Application and Operating System 
Levels 


In the System Overview window menu bar, select Customize. The Availability 
Manager displays the shortcut menu shown in Figure 7-2. 
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Figure 7-2 Application and Operating System Customization Menu 


HP Availability Manager V2.6 


File View [Gustomize| Help 


© Customize Application... 
© Customize OpenVMS... 


® © Open 


@ 3 DE er] Customize Windows NT... | 3 47 
© spat 
‘'B aLmost : : : . 
( eBup24 44 92 s fom 
=| MONSON - 2 Z : 
3 pRFEaS 4/4 0 5 0 
2 apres 212 0 3 0 


7.2.1 Customizing Application Settings 


When you select Customize Application..., by default the Availability Manager 
displays the Group/Nodes Lists page (Figure 7-3), where the Inclusion lists tab 
is the default. 


Note 


The Event Escalation tab displayed on the Application Settings page 
(Figure 7-3) is explained in Section 7.7. 


7.2.1.1 Application Settings—Groups/Nodes Inclusion Page 


On the Groups/Nodes Inclusion page (Figure 7-3) you can select groups of nodes 
or individual nodes to be displayed. 


Figure 7-3 Application Settings—Groups/Nodes Inclusion 
x 


GroupiNode Lists | Eventescalation | 
© Inclusion lists 


Groups/Nodes to display | Use default values 


{Vi Group List {J Node List Explanation 
DECAMDS When the "Group 
Debug cluster List checkbox is 
KOINE checked for groups 
KOINE2 or"Node List" for 
INT Nodes nodes, only the 


groups or nodes in 
the checked lists 
are monitored. If 
both checkboxes 
are unchecked, 
then all groups and 
nodes will be 
monitored. 


Application Settings | 0K | Cancel | Apply | sey | 


Customizing the Availability Manager 7-5 


Customizing the Availability Manager 
7.2 Customizing Settings at the Application and Operating System Levels 


On the Groups/Nodes Inclusion page, you have the following choices: 


Group List 


Select the Group List check box. Then enter the names of the groups of 
nodes you want to monitor. (The names are case-sensitive, so be sure to enter 
the correct case.) 


For instructions on changing the group membership of a node, see 
Section 7.4.1 and Section 7.4.2 
Node List 


Select the Node List check box. Then enter the names of individual nodes 
you want to monitor. (The names are case-sensitive, so be sure to enter the 
correct case.) 


Both Group List and Node List 


If you select both check boxes, you can enter the names of groups of nodes as 
well as individual nodes you want to monitor. (If you enter the name of an 
individual node, the Availability Manager displays the name of the group that 
the node is in, but no additional nodes in that group.) 


Neither list 


The Group List and Node List are not used; all groups and all nodes are 
monitored. 


If you decide to return to the default (Group List: DECAMDS) or to enter names 
again, select Use default values. 


After you enter a list of nodes or groups of nodes, click one of the following 
buttons at the bottom of the page: 


Option Description 

OK Accepts the choice of names you have entered and exits the page. 

Cancel Cancels the choice of names and does not exit the page. 

Apply Accepts the choice of names you have entered but does not exit the 
page. 


If nodes were previously selected for monitoring, their names are not removed 
from the display even if you click OK or Apply. They are filtered out the next 
time the Availability Manager is started. 


7.2.1.2 Application Settings—Groups/Nodes Exclusion Lists 
As an alternative to the Inclusion lists on the Groups/Nodes Inclusion page, you 
can click the Exclusion lists tab in Figure 7—4, where you can select groups of 
nodes or individual nodes to be excluded from display. 
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Figure 7-4 Application Settings—Groups/Nodes Exclusion Lists 


Customization - Application Settings ; xi} 
((GroupmNode Lists _|(Eventtestalation| 
Exclusion lists | 


Groups/Nodes to exclude —_ fl Use defauttvaiues 


{I Group List {i Node List Explanation 


When the "Group 
List" checkbox is 
checked for groups 
or"Node List" for 
nodes, the groups 
or nodes in the 
checked lists are 
not monitored. If 
both checkboxes 
are unchecked, 
then no groups or 
nodes will be 
excluded. 


Application Settings | 0K | Cancel | Apply | Het 


On the Groups/Nodes Exclusion Lists page, you have the following choices: 
e Group List 


Select the Group List check box. Then enter the names of the groups of 
nodes you want to exclude from monitoring. (The names are case-sensitive, so 
be sure to enter the correct case.) 


For instructions on changing the group membership of a node, see 
Section 7.4.1 and Section 7.4.2. 


e Node List 


Select the Node List check box. Then enter the names of individual nodes 
you want to exclude from monitoring. (The names are case-sensitive, so be 
sure to enter the correct case.) 


e Both Group List and Node List 


If you select both check boxes, you can enter the names of groups of nodes as 
well as individual nodes you want to exclude from monitoring. (If you enter 

the name of an individual node, the Availability Manager displays the name 

of the group that the node is in, but no additional nodes in that group.) 


e Neither box 


The Group List and Node List are not used; all groups and all nodes are 
monitored. 


After you enter a list of nodes or groups of nodes, click one of the buttons at the 
bottom of the page: 
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Option Description 
OK 


Cancel 


Accepts the choice of names you have entered and exits the page. 
Cancels the choice of names and does not exit the page. 


Apply Accepts the choice of names you have entered but does not exit the 
page. 


If nodes were previously selected for monitoring, their names are not removed 
from the display even if you click OK or Apply to exclude them from monitoring. 


7.2.2 Customizing Windows Operating System Settings 


When you select Customize Windows NT..., the Availability Manager displays a 
page similar to the one shown in Figure 7-5. 


Figure 7-5 Windows Operating System Customization 


Customization - Windows NT Default Settings xi} 
(Events | EventEscalation | Security | 


CFGDON, configuration done 


+) if] Use default values | 


Event Customizations 


Severity : 10 [EJ o urrence | 1 é 
| _  __ 


Threshold 
Threshold [2] 


Escalation actions: [| User [| OPCOM [_] HP OpenView 


Iser Actior 


Windows™ procedure 


Event explanation and investigation hints 


The Availability Manager has made a connection to the data collection 
node and will start collecting data according to the customize data 
collection options selected. 


This is an informational event to indicate that the node has been 
recognized. No further investigation is required. 


7] Global Windows NT ok | cancet || apply | Help | 


The default page displayed is the Event Customization page. Instructions for 
using this page are in Section 7.8.1. The other tabs displayed are the Event 
Escalation page, which is explained in Section 7.7, and the Windows Security 
Customization page, which is explained in Section 7.9.2.2. 


7.2.3. Customizing OpenVMS Operating System Settings 


When you select Customize OpenVM6..., the Availability Manager displays 
the page shown in Figure 7-6, which contains tabs for the last 6 types of 
customization listed in Table 7-1. (Instructions for making these types of 
customizations are later in this chapter, beginning in Section 7.5.) 
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Figure 7-6 OpenVMS Operating System Customization 


Customization - Open¥MS Default Settings es| 
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© Disk status 

fii] Disk volume 
il vo 
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Icons are used to indicate the current customization level in effect. 

Indicates the current settings are from the Availability Manager 
built-in set. 

©. Indicates the current settings are from the Application level 


& Global OpenVMS | ok || cancel | Appy | Hein 


7.3 Customizing Settings at the Group Level 


To perform customizations at the group level, right-click a group name in the 
System Overview window. The Availability Manager displays a small menu 
similar to the one shown in Figure 7-7. 


Figure 7-7 Group Customization Menu 


HP Availability Manager V2.6 


File View Customize Help 


Groups/Nodes 4 # CPUs} CPU 
Defined Groups 


@ © OpenvMs (3) (31) 
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© at Display... 
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When you select Customize, the Availability Manager displays a page similar to 
the one shown in Figure 7-6. 
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7.4 Customizing Settings at the Node Level 
To customize a specific node, do either of the following: 
e Select the Customize option at the top of the Group/Node page. 


e Right-click a node name in the Node pane of the System Overview window 
(see Figure 2-1). 


The Availability Manager displays the shortcut menu shown in Figure 7-8. 


Note 


You can customize nodes in any state. 


Figure 7-8 Node Customization Menu 


HP Availability Manager V2.6 
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Display... 


4 Fix Options... 


When you select Customize, the Availability Manager displays a customization 
page similar to the one shown in Figure 7-6. 


7.4.1 Changing the Group of an OpenVMS Node 


Each Availability Manager Data Collector node is assigned to the DECAMDS 
group by default. 


Note 


You need to place nodes that are in the same cluster in the same group. 
If such nodes are placed in different groups, some of the data collected 
might be misleading. 


You need to edit a logical on each Data Collector node to change the group for 
that node. To do this, follow these steps: 


1. Assign a unique name of up to 15 alphanumeric characters 
to the AMDS$GROUP_NAME logical name in the 
AMDS$AM_SYSTEM:AMDS$LOGICALS.COM file. For example: 


$ AMDSSDEF AMDSSGROUP_NAME FINANCE ! Group FINANCE; OpenVMS Cluster alias 
2. Apply the logical name by restarting the Data Collector: 


S$ @SYSSSTARTUP:AMDSSSTARTUP RESTART 
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7.4.2 Changing the Group of a Windows Node 


Note 


These instructions apply to versions prior to Version 2.0-1. 


You need to edit the Registry to change the group of a Windows node. To edit the 
Registry, follow these steps: 


1. 


10. 
1h, 


12. 


OM NADH RK w 


Click the Windows Start button. On the menu displayed, first select 
Programs, then Accessories, and then Command Prompt. 


Type REGEDIT after the angle prompt (>). 


The system displays a screen for the Registry Editor, with a list of entries 
under My Computer. 


On the list displayed, expand th HKEY_LOCAL_MACHINE entry. 
Double-click SYSTEM. 

Click CurrentControlSet. 

Click Services. 

Click damdrvyr. 

Click Parameters. 


Double-click Group Name. Then type a new group name of 15 alphanumeric 
characters or fewer, and click OK to make the change. 


On the Control Panel, select Services, and then select Stop for “PerfServ.” 


Again on the Control Panel, select Devices, and then select Stop for 
“damdrvyr.” 


First restart damdrvr under “Devices,” and then restart PerfServ under 
“Services.” 


This step completes the change of groups for this node. 


7.5 Customizing OpenVMS Data Collection 


Note 


Before you start this section, be sure to read the explanation of data 
collection, events, thresholds, and occurrences in Chapter 1. Also, be sure 
you understand background and foreground data collection. 


When you choose the Customize OpenVMS menu option in the System 
Overview window (see Figure 7—2), by default the Availability Manager displays 
the OpenVMS Data Collection Customization page (Figure 7-9) where you can 
select types of data you want to collect for all of the OpenVMS nodes you are 
currently monitoring. You can also change the default Availability Manager 
intervals at which data is collected or updated. 
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Figure 7-9 OpenVMS Data Collection Customization 
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Icons are used to indicate the current customization level in effect. 

Indicates the current settings are from the Availability Manager 
built-in set. 

c. Indicates the current settings are from the Application level 
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Table 7-3 identifies the page on which each type of data collected and displayed 
in Figure 7-9 appears and indicates whether or not background data collection 
is turned on for that type of data collection. See Chapter 1 for information about 
background data collection. (You can also customize data collection at the group 
and node levels, as explained in Section 7.1.) 


Note 


When you select a type of data collection, an icon appears on the 

“Use default values” button indicating the previous (higher) level of 
customization where customizations might have been made. Pressing the 
“Use default values” button followed by the “Apply” button causes any 
customizations made at the current level to be discarded and the values 
from the previous collection to be used. 


You can select more than one collection choice using the Shift and/or Ctrl 
keys. In this case, none of the icons appear on the “Use default values” 
button. Pressing the “Use default values” button causes each selected 
collection choice to be reset to the value at its own previous level of 
customization. 


7-12 Customizing the Availability Manager 


Customizing the Availability Manager 
7.5 Customizing OpenVMS Data Collection 


Table 7-3 Data Collection Choices 


Background 
Data 
Collection 
Data Collected Default Page Where Data Is Displayed 
Cluster summary No Cluster Summary page 
CPU mode No CPU Modes Summary page 
CPU summary No CPU Process States page 
Disk status No Disk Status Summary page 
Disk volume No Disk Volume Summary page 
V/O data No Y/O Summary page 
Lock contention No Lock Contention page 
Memory No Memory Summary page 
Node summary Yes Node pane, Node Summary page, and the top 
pane of the CPU, Memory, and I/O pages 
Page/Swap file No I/O Page Faults page 
Single disk Yes? Single Disk Summary page 
Single process Yes” Data collection for the Process Information page 


1Data is collected by default when you open a Single Disk Summary page. 
2Data is collected by default when you open a Single Process page. 


You can choose additional types of background data collection by selecting the 
Collect check box for each one on the Data Collection Customization page of the 
Customize OpenVMS... menu (Figure 7-6). A check mark indicates that data 
is to be collected at the intervals described in Table 7-4. 


Table 7-4 Data Collection Intervals 


Interval Name Description 

Display How often the data is collected when its corresponding display is 
active. 

Event How often the data is collected when its corresponding display is 


not active and when events are active. 


NoEvent How often the data is collected when its corresponding display is 
not active and when events are not active. 


You can enter a different collection interval by selecting a row of data and 
selecting a value. Then delete the old value and enter a new one. 


If you change your mind and decide to return to the default collection interval, 
select one or more rows of data items: then select Use default values. The 
system displays the default values for all the collection intervals. 


When you finish customizing your data collection, click one of the following 
buttons at the bottom of the page: 
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Option Description 

OK To confirm any changes you have made and exit the page. 

Cancel To cancel any changes you have made and exit the page. 

Apply To confirm and apply any changes you have made and not exit the page. 


7.6 Customizing OpenVMS Data Filters 


When you choose “Customize” at the operating system, group, or node level and 
then select the Filter tab, the Availability Manager displays pages that allow you 
to customize data (see Figure 7-10). The types of data filters available are the 
following: 


e CPU 

e Disk Status 

e Disk Volume 

e T/O 

e Lock Contention 
e Memory 

e Page/Swap File 


Filters can vary depending on the type of data collected. For example, filters 
might be process states or a variety of rates and counts. The following sections 
describe data filters that are available for various types of data collection. 


You can also customize filters at the group and node levels (see Section 7.1). 


Keep in mind that the customizations that you make at the various levels 
override the ones set at the previous level (see Table 7-1). The icons preceding 
each data item (see Table 7—2) indicate the level at which the data item was 
customized. In Figure 7-10, for example, the icon preceding “CPU” indicates that 
the current setting comes from the AM Defaults. 


If you change your mind and decide to return to filter values set at the previous 
level, select Use default values. The icon appearing on the button indicates the 
level of the previous values. In Figure 7-10, for example, the previous value is 
the AM Defaults value. 


When you finish modifying filters on a page, click one of the following buttons at 
the bottom of the page: 


Option Description 

OK To confirm any changes you have made and exit the page. 

Cancel To cancel any changes you have made and exit the page. 

Apply To confirm and apply any changes you have made and continue to display the 
page. 
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7.6.1 OpenVMS CPU Filters 


When you select “CPU” on the Filter tabs, the Availability Manager displays the 
OpenVMS CPU Filters page (Figure 7-10). 


Figure 7-10 OpenVMS CPU Filters 
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The OpenVMS CPU Filters page allows you to change and select values that are 
displayed on the OpenVMS CPU Process States page (Figure 3-8). 


You can change the current priority and rate of a process. By default, a process 
is displayed only if it has a Current Priority of 4 or more. Click the up or down 
arrow to increase or decrease the priority value by one. The default CPU rate is 
0.0, which means that processes with any CPU rate used will be displayed. To 
limit the number of processes displayed, you can click the up or down arrow to 
increase or decrease the CPU rate by .5 each time you click. 


The OpenVMS CPU Filters page also allows you to select the states of the 
processes that you want to display on the CPU Process States page. Select the 
check box for each state you want to display. (Process states are described in 
Appendix B.) 


7.6.2 OpenVMS Disk Status Filters 


When you select Disk Status on the Filter tabs, the Availability Manager 
displays the OpenVMS Disk Status Filters page (Figure 7-11). 
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Figure 7-11 OpenVMS Disk Status Filters 
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The OpenVMS Disk Status Summary page (Figure 3-14) displays the values you 


set on this page. 


This page lets you change the following default values: 


Data 


Description 


Error Count 


Transaction 
Mount Count 
RWAIT Count 


The number of errors generated by the disk (a quick indicator of 
device problems). 


The number of in-progress file system operations for the disk. 
The number of nodes that have the specified disk mounted. 


An indicator that a system I/O operation is stalled, usually during 
normal connection failure recovery or volume processing of host- 
based shadowing. 


This page also lets you check the states of the disks you want to display, as 
described in the following table: 


Disk State Description 

Invalid Disk is in an invalid state (Mount Verify Timeout is likely). 
Shadow Member Disk is a member of a shadow set. 

Unavailable Disk is set to unavailable. 

Wrong Vol Disk was mounted with the wrong volume name. 

Mounted Disk is logically mounted by a MOUNT command or a service call. 
Mount Verify Disk is waiting for a mount verification. 

Offline Disk is no longer physically mounted in device drive. 
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Disk State Description 


Online Disk is physically mounted in device drive. 


7.6.3 OpenVMS Disk Volume Filters 


When you select Disk Volume on the Filter tabs, the Availability Manager 
displays the OpenVMS Disk Volume Filters page (Figure 7-12). 


Figure 7-12 OpenVMS Disk Volume Filters 
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The OpenVMS Disk Volume Filters page allows you to change the values for the 


following data: 

Data Description 

Used Blocks The number of volume blocks in use. 

Disk % Used The percentage of the number of volume blocks in use in relation to 
the total volume blocks available. 

Free Blocks The number of blocks of volume space available for new data. 

Queue Length Current length of I/O queue for a volume. 

Operations Rate The rate at which the operations count to the volume has changed 


since the last sampling. The rate measures the amount of activity 
on a volume. The optimal load is device specific. 


You can also change options for the following to be on (checked) or off (unchecked): 
e RAMdisks: Show devices 
e Sec. Page/Swap: Show devices 
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Secondary Page or Swap devices are disk volumes that have “PAGE” or 
“SWAP” in the volume name. This filter is useful for filtering out disks that 
are used only as page or swap devices. 


e Wrtlocked Volumes: Show devices (for example, CDROM devices) 


e Exclude Devices: Use device filter 


You can exclude specific disk volumes by listing them in the Exclude Devices 
text box. You can use wildcards to specify the disk volumes. Four examples 
are shown in Figure 7-12. 


7.6.4 OpenVMS I/O Filters 


When you select I/O on the Filter tabs, the Availability Manager displays the 
OpenVMS I/O Filters page (Figure 7-13). 


Figure 7-13 OpenVMS I/O Filters 
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The OpenVMS I/O Summary page (Figure 3-12) displays the values you set on 
this filters page. 


This filters page allows you to change values for the following data: 


Data Description 


Direct I/O Rate The rate of direct I/O transfers. Direct I/O is the average percentage 
of time that the process waits for data to be read from or written to 
a disk or tape. The possible state is DIO. Direct I/O is usually disk 
or tape I/O. 
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Data 


Description 


Buffered I/O Rate 


Paging I/O Rate 


Open File Count 
BIO lim Remaining 


DIO lim Remaining 


BYTLM Remaining 


Open File limit 


The rate of buffered I/O transfers. Buffered I/O is the average 
percentage of time that the process waits for data to be read from or 
written to a slower device such as a terminal, line printer, mailbox. 
The possible state is BIO. Buffered I/O is usually terminal, printer 
I/O, or network traffic. 


The rate of read attempts necessary to satisfy page faults (also 
known as Page Read J/O or the Hard Fault Rate). 


The number of open files. 


The number of remaining buffered I/O operations available before 
the process reaches its quota. BIOLM quota is the maximum 
number of buffered I/O operations a process can have outstanding 
at one time. 


The number of remaining direct I/O limit operations available 
before the process reaches its quota. DIOLM quota is the maximum 
number of direct I/O operations a process can have outstanding at 
one time. 


The number of buffered I/O bytes available before the process 
reaches its quota. BYTLM is the maximum number of bytes of 
nonpaged system dynamic memory that a process can claim at one 
time. 


The number of additional files the process can open before reaching 
its quota. FILLM quota is the maximum number of files that can 
be opened simultaneously by the process, including active network 
logical links. 


7.6.5 OpenVMS Lock Contention Filters 


The OpenVMS Lock Contention Filters page allows you to remove (filter out) 
resource names from the Lock Contention page (Figure 3-19). 


When you select Lock Contention on the Filter tabs, the Availability Manager 
displays the OpenVMS Lock Contention Filters page (Figure 7-14). 
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Figure 7-14 OpenVMS Lock Contention Filters 
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Each entry on the Lock Contention Filters page is a resource name or part of a 
resource name that you want to filter out. For example, the STRIPE$ entry filters 
out any value that starts with the characters STRIPE$. To redisplay values set 
previously, select Use default values. 


7.6.6 OpenVMS Memory Filters 


When you select Memory Filters on the Filter tabs, the Availability Manager 
displays a OpenVMS Memory Filters page that is similar to the one shown in 
(Figure 7-15). 
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Figure 7-15 OpenVMS Memory Filters 
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The OpenVMS Memory page (Figure 3-10) displays the values on this filter page. 


The OpenVMS Memory Filters page allows you to change values for the following 
data: 


Data Description 


Working Set Count The number of physical pages or pagelets of memory that the 
process is using. 


Working Set Size The number of pages or pagelets of memory the process is allowed 
to use. The operating system periodically adjusts this value based 
on an analysis of page faults relative to CPU time used. An increase 
in this value in large units indicates a process is receiving a lot of 
page faults and its memory allocation is increasing. 


Working Set Extent The number of pages or pagelets of memory in the process’s 
WSEXTENT quota as defined in the user authorization file (UAF). 
The number of pages or pagelets will not exceed the value of the 
system parameter WSMAX. 


Page Fault Rate The number of page faults per second for the process. 


Page I/O Rate The rate of read attempts necessary to satisfy page faults (also 
known as page read J/O or the hard fault rate). 


7.6.7 OpenVMS Page/Swap File Filters 
When you select Page/Swap File on the Filter tabs, the Availability Manager 
displays the OpenVMS Page/Swap File Filters page (Figure 7-16). 


Customizing the Availability Manager 7-21 


Customizing the Availability Manager 
7.6 Customizing OpenVMS Data Filters 


Figure 7-16 OpenVMS Page/Swap File Filters 
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The OpenVMS I/O Summary page (Figure 3-12) displays the values that you set 
on this filter page. 


This filter page allows you to change values for the following data: 


Data Description 


Used Blocks The number of used blocks within the file. 

Page File % Used The percentage of the blocks from the page file that have been used. 
Swap File % Used The percentage of the blocks from the swap file that have been used. 
Total Blocks The total number of blocks in paging and swapping files. 


Reservable Blocks Number of reservable blocks in each paging and swapping file 
currently installed. Reservable blocks can be logically claimed by a 
process for a future physical allocation. A negative value indicates 
that the file might be overcommitted. Note that a negative value is 
not an immediate concern but indicates that the file might become 
overcommitted if physical memory becomes scarce. 


Note: Reservable blocks are not used in more recent versions of 
OpenVMS. 


You can also select (turn on) or clear (turn off) the following options: 
e Show page files 


e Show swap files 
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7.7 Customizing Event Escalation 


You can customize the way events are displayed in the Event pane of the System 
Overview window (Figure 2—1) and configure events to be signaled to OPCOM or 
HP OpenView. You do this by setting the criteria that determine whether events 
are signaled on the Event Escalation Customization page (Figure 7-17). 


Note 


Event escalation is the one set of Availability Manager parameters that 
you can adjust at all four configuration levels (Application, Operating 
System, Group, and Node). 


When you select any of the customization options, the Availability Manager 
displays a tabbed page similar to the one shown in Figure 7-17. 


Figure 7-17 Event Escalation Customization 
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The Event Escalation Customization page contains the following sections: 
e Event Window 


With the exception of “Informational event timeout (secs)”, the items in this 
section are dimmed because they have not yet been implemented. However, 
you can set the number of seconds that an informational event is displayed in 
the Event pane of the System Overview window (Figure 2-1). (The default is 
30 seconds.) 


e OPCOM 


The items in this section are dimmed if you are not using an OpenVMS 
system. 
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If you are using an OpenVMS system, you can check the box in the OPCOM 
section of the page and then enter two values that work together to determine 
whether an event is sent to OPCOM: 
— Escalate events over severity threshold (0-100) 
The severity level over which an event might be sent to OPCOM if the 
second criterion is met. 
— Timeout triggering escalation of events (secs) 


The length of time, in seconds, that an event (over a severity threshold 
that you have entered) is displayed in the Event pane of the System 
Overview window (Figure 2—1) before the event is sent to OPCOM. 


e HP OpenView 


Values that you enter have no effect if you do not have HP OpenView agents 
installed and configured on your system. (For configuration instructions, see 
the next section.) 


If HP OpenView agents are installed and configured on your system, you can 
check the box in the OpenView section of the page and then enter two values 
that work together to determine whether an event is sent to OpenView: 


— Escalate events over severity threshold (0-100) 


The severity level over which an event might be sent to OpenView if the 
second criterion is met. 


— Timeout triggering escalation of events (secs) 


The length of time, in seconds, that an event (over a certain severity 
threshold) is displayed in the Event pane of the System Overview window 
(see Figure 2—1) before the event is sent to OpenView. 


The following table compares Availability Manager and OpenView severity 


levels: 

Availability Manager OpenView 
0-19 Normal 
20 - 39 Warning 
40 - 59 Minor 

60 - 79 Major 

80 - 100 Critical 


Important 


For an event to be escalated using OPCOM or HP OpenView, the following 
conditions must be met: 


e On the Event Customizations page (Figure 7-18), the OPCOM or HP 
OpenView box must be checked. 


e On the Event Escalation page (Figure 7-17), the box in the OPCOM 
or HP OpenView section of the page must be checked. 


e On the Event Escalation page (Figure 7-17), the severity of an 
event must meet or exceed the corresponding severity threshold 
for the event, which is shown on the Event Customizations page 
(Figure 7-18). 
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e The event must be displayed in the Event pane of the System 
Overview window (Figure 2-1) for the required length of time before 
the event is sent to OPCOM or OpenView. (The default is 10 minutes.) 


Figure 7-18 Event Customizations 
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7.7.1 Configuring HP OpenView on Your Windows or HP-UX System 
Note 


The instructions in this section are for configuring HP OpenView 
on Windows. (The configuration for HP-UX systems is very similar; 
instructions, however, are not included in this section.) 


Installing the HP OpenView Server 
Prior to configuring HP OpenView, you must perform two steps: 


1. Install the HP OpenView server software on a Windows or an HP-UX system. 
(The Availability Manager can forward events to either a Windows or an 
HP-UX system.) For information about performing these installations, see the 
HP OpenView documentation. 


2. Install the HP OpenView template for the Availability Manager on the 
HP OpenView server. This is described in the Guide for Setting Up the 
Availability Manager to Forward Events to OpenView on the Documentation 
page on the Availability Manager Web site: 


http: //h71000.www7.hp.com/openvms/products/availman/docs.html 
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Configuring the HP OpenView Server and Agents 
You can run the Availability Manager on a Windows or on an OpenVMS system. 


If you run the Availability Manager on a Windows system, follow these steps: 


1. Configure the HP OpenView server so that the Windows system is a 
configured node. 


2. Deploy the Availability Manager template, AvailMan, to the Windows system. 


The AvailMan template is stored under "Policy management \ Policies grouped 
by type" in the OpenView Operations window: 


HP OpenView\Operations Manager 
If you run the Availability Manager on an OpenVMS system, follow these steps: 


1. Install and configure the HP-OpenView agents on the OpenVMS system 
according to the instructions in the document “About OpenVMS Managed 
Nodes,” which is a link on the HP OpenView Agents for OpenVMS Web page: 


http: //h71000.www7.hp.com/openvms/products/openvms_ovo_agent/index. html 
2. Deploy the Availability Manager template, AvailMan, to the OpenVMS 
system. 
7.7.2 Using HP OpenView on Your System 


On the OpenView server you can create or modify policies or templates of 

the Open Message Interface group to manipulate events that the Availability 
Manager has escalated. For parameters or options fields set by the Availability 
Manager, see Table 7-5. 


Table 7-5 Parameters and Option Fields Used with OpenView 


Parameter or Option Field Description 

<$MSG_APPL> Application: "AvailMan" (appears to be case 
sensitive) 

<$MSG_OBJECT> Object: 6-character event name (example: 
"HIBIOR") 

<$MSG_GRP> Group: Node originating the event (example: 
"CMOVEQ") 

<$MSG_SEV> Derived from <$OPTION(SEVERITY)> in 


the Availability Manager; the Availability 


Manager maps SEVERITY to NORMAL, 
WARNING, MINOR, MAJOR, CRITICAL 
<$MSG_TEXT> Message text: Event description (example: 
"CMOVEQ buffered I/O rate is high") 
<$MSG_NODE> Node running AvailMan 
<$MSG_NODE_NAME> Node running AvailMan 
<$OPTION(NODE)> Node originating the event (example: 
"CMOVEQ") 
<$OPTION(GROUP)> Group to which originating node belongs 


(example: "Debug cluster") 


(continued on next page) 
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Table 7-5 (Cont.) Parameters and Option Fields Used with OpenView 


Parameter or Option Field Description 

<$OPTION(SEQUENCE_NUMBER)> AM internal event sequence number (example: 
" 14") 

<$OPTION(SEVERITY)> AM event severity (0-100) (example: "60") 

<$OPTION(EVENT)> 6-character event name (example: "HIBIOR") 

<$OPTION(TIME)> Original time event posted (example: "15-Aug- 


2005 14:41:44,164") 


7.8 Customizing Events and User Notification of Events 


You can customize a number of characteristics of the events that are displayed in 
the Event pane of the System Overview window (Figure 2-1). You can also use 
customization options to notify users when specific events occur. 


When you select the Operating System -> Customize OpenVMS... or 
Operating System —> Customize Windows NT... from the System Overview 
window Customize menu, the Availability Manager displays a tabbed page 
similar to the one shown in Figure 7-19. 


Figure 7-19 Event Customizations 
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On OpenVMS systems, you can customize events at the operating system, 
group, or node level. On Windows systems, you you can customize events at the 
operating system or node level. 


Keep in mind that an event that you customize at the group level overrides the 
value set at a previous (higher) level (see Table 7-1). 
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7.8.1 Customizing Events 


You can change the values for any data that is available—that is, not dimmed—on 
this page. The following table describes the data you can change: 


Data 


Description 


Severity 


Occurrence 


Threshold 


Escalation 
actions 


User Action 


Controls the severity level at which events are displayed in the Event 
pane of the System Overview window (Figure 2-1). By default, all 
events are displayed. Increasing this value reduces the number of 
event messages in the Event pane of the System Overview window 
(Figure 2-1) and can improve perceived response time. 


Each Availability Manager event is assigned an occurrence value, 
that is, the number of consecutive data samples that must exceed the 
event threshold before the event is signaled. By default, events have 
low occurrence values. However, you might find that a certain event 
indicates a problem only when it occurs repeatedly over an extended 
period of time. You can change the occurrence value assigned to that 
event so that the Availability Manager signals the event only when 
necessary. 


For example, suppose page fault spikes are common in your 
environment, and the Availability Manager frequently signals 
intermittent HITTLP, total page fault rate is high events. You could 
change the event’s occurrence value to 3, so that the total page fault 
rate must exceed the threshold for three consecutive collection intervals 
before being signaled to the event log. 


To avoid displaying insignificant events, you can customize an event 
so that the Availability Manager signals it only when it occurs 
continuously. 


Most events are checked against only one threshold; however, some 
events have dual thresholds, where the event is triggered if either one 
is true. For example, for the LOVLSP, node disk volume free space 

is low event, the Availability Manager checks both of the following 
thresholds: 


e Number of blocks remaining 


e Percentage of total blocks remaining 


You can enter one or more of the following values: 


e User: If the event occurs, the Availability Manager refers to the 
User Action field to determine what action to take. 


e OPCOM: If the event occurs, and certain conditions are met 
(see Section 7.7), the Availability Manager passes that event to 
OPCOM. (Data Analyzer on OpenVMS only) 


e HP OpenView: If the event occurs and certain conditions are met 
(see Section 7.7), the Availability Manager passes that event to HP 
OpenView. (OpenView agents must be installed and configured on 
the Data Analyzer node.) 


When the Event escalation action field is set to User, User Action is no 
longer dimmed. You can enter the name of a procedure to be executed 
if the event displayed at the top of the page occurs. To use this field, 
see the instructions in Section 7.8.2. 


The “Event explanation and investigation hints” section of the Event 
Customizations page, which is not customizable, includes a description of the 
event displayed and suggestions for how to correct any problems that the event 


signals. 
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7.8.2 Entering a User Action 


Note 


OpenVMS and Windows execute the User Action procedure somewhat 
differently, as explained in the following paragraphs. 


The following notes pertain to writing and executing User Action commands or 
command procedures. These notes apply to User Actions on both OpenVMS and 
Windows systems. 


The procedure that you specify as the User Action is executed in the following 
manner: 


— It is issued to the operating system that is running the Availability 
Manager Data Analyzer. 


— It is issued as a process separate from the one running the Availability 
Manager to avoid affecting its operation. 


— It is run under the same account as the one running the Availability 
Manager Data Analyzer. 


User Actions are intended to execute procedures that do not require 
interactive displays or user input. 


You can enter User Actions for events on either a systemwide basis or a 
per-node basis: 


-— Ona systemwide basis, the User Action is issued for an event that occurs 
on any node. 


— Ona per-node basis, the User Action is issued for an event that occurs 
only on a specific node. 


If event logging is enabled, the Availability Manager writes events to the 
event log file (called AnalyzerEvents.log by default on OpenVMS systems and 
Windows systems). A status line matching the original line indicates whether 
the User Action was successfully issued. For example: 


AMGR/KOINE -- 13-Apr-2005 15:33:02.531 --<0,CFGDON>KOINE 
AMGR/KOINE -- 13-Apr-2005 15:33:02.531 --<0,CFGDON>KOINE 
(User Action issued for this event on the client 0/S) 


configuration done 
configuration done 


OI 


Other events might appear between the first logging and the status line. The 
log file does not indicate whether the User Action executed successfully. You 

must obtain the execution status from the operating system, for example, the 
OpenVMS batch procedure log. 


The User Action functionality might be enhanced in a future release of the 
Availability Manager, but backward compatibility is not guaranteed for the 
format of User Action procedure strings or for the method of executing the 
procedures on a particular operating system. 
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7.8.2.1. Executing a Procedure on an OpenVMS System 


The User Action procedure and arguments are passed as string values to the DCL 
command interpreter as follows: 


SUBMIT/NOPRINTER/LOG user_action_procedure arg_1 arg_2 arg_3 arg_4 
where: 


e The first command is the DCL command SUBMIT with associated qualifiers. 


e user_action_procedure is a valid OpenVMS file name. 


Enter the name of the procedure you want OpenVMS to execute using the 
following format: 


disk.[directory/filename.COM 

where: 

— disk is the name of the disk where the procedure resides. 

— directory is the name of the directory where the procedure resides. 


—  filename.COM is the file name of the command procedure you want 
OpenVMS to execute. The file name must follow OpenVMS file-naming 
conventions. 


The file you submit must contain one or more DCL command statements that 
form a valid OpenVMS command procedure. 


e The arguments correspond to the following data: 


Argument Description 

arg 1 Node name of the node that generated the event. 
arg 2 Date and time that the event was generated. 
arg_3 Name of the event. 

arg 4 Description of the event. 


The Availability Manager does not interpret the string contents. You can supply 
any content in the User Action procedure that DCL accepts in the OpenVMS 
environment for the user account running the Data Analyzer. However, if you 
include arguments in the User Action procedure, they might displace or overwrite 
arguments supplied by the Availability Manager. 


A suitable batch queue must be available on the Data Analyzer computer to be 
the target of the SUBMIT command. See the HP OpenVMS DCL Dictionary for 
the SUBMIT, INITIALIZE/QUEUE, and START/QUEUE commands for use of 
batch queues and the queue manager. 


The following is an example of a DCL command procedure: 


$ if (p3.eqs."DSKOFF") .and. (pl.eqs."PAYROL") 

$ then 

$  mail/subject="''p2' ''p3' ''p4'" urgent_instructions.txt 
call_center, finance, adams 

$ else 

$  mail/subject="''p2' ''p3’ ‘''p4'" instructions.txt call_center 
$ endif 


The pn numbers in the DCL procedure correspond in type, number, and position 
to the arguments in the preceding table. 
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You might use a procedure like this one to notify several groups if the payroll disk 
goes off line, or to notify the call center if any other event occurs. 


7.8.2.2 Executing a Procedure on a Windows System 


The user action procedure and arguments are passed as string values to the 
Windows command interpreter as follows: 


"AT time CMD/C user_action_procedure arg_1 arg_2 arg_3 arg_4" 


where: 


e AT is the Windows command that schedules commands and programs at a 
specified time and date. 


e The time substring is a short period of time— aproximately 2 minutes—in the 
future so that the AT utility processes the user action procedure today rather 
than tomorrow. This is necessary because the AT utility cannot execute a 
procedure “now” rather than at an explicitly stated time. 


e user_action_procedure is a Windows command or valid file name. The file 
must contain one or more Windows command statements to form a valid 
command procedure. (See the example in this section.) 


Enter the name of the procedure you want Windows to execute using the 
following format: 


device: \ (directory \ filename.BAT 


where: 
— device is the disk on which the procedure is located. 
— directory is the folder in which the procedure is located. 


—  filename.BAT is the name of the command file to be executed. 


Notes 


The file name must follow Windows file-naming conventions. However, 
due to the processing of spaces in the Java JRE, HP recommends that you 
do not use spaces in a path or file name. 


HP recommends that you use a batch file to process and call procedures 
and applications. 


e The arguments are listed in the table in Section 7.8.2.1. 


The Availability Manager does not interpret the string contents. You can supply 
any content in the string that the Windows command-line interpreter accepts for 
the user account running the Data Analyzer. However, if you include arguments 
in the User Action procedure, they might displace or overwrite arguments 
supplied by the Availability Manager. 


You cannot specify positional command-line switches or arguments to the AT 
command, although you can include switches in the User Action procedure 
substring as qualifiers to the user-supplied command. This is a limitation of both 
the Windows command-line interpreter and the way the entire string is passed 
from the Availability Manager to Windows. 


Customizing the Availability Manager 7-31 


Customizing the Availability Manager 
7.8 Customizing Events and User Notification of Events 


The Schedule service must be running on the Data Analyzer computer in order to 
use the AT command. However, the Schedule service does not run by default. To 
start the Schedule service, see the Windows documentation for instructions in the 
use of the CONTROL PANEL->SERVICES->SCHEDULE->|startup button]. 


Windows Example 
To set up a user action, follow these steps: 


1. Select an event on the Event Customizations page, for example, HIBIOR (see 
Figure 7-20). 


Change the Event escalation action to User. 
Enter the name of the program to run. For example: 


c:\send_message.bat 


Figure 7-20 User Action Example 


Customization - Open¥MS Default Settings xi 


a 
| Il HIPwI0, high paging write VO rate || ill Use detauttvalues | 
Event Customizations 
Severity a ee | Occurrence [| 2 ff 
Threshold | 5 —_|2| Page file writes per second 
Threshold [ 


Escalation actions: [¥j User [@i OPCOM [_j HP OpenView 
User Action |c:isend_message.bat | Windows™ procedure 
Event explanation and investigation hints 

The average paging write I/O rate on the node exceeds the threshold. 


Use the Process 0 and Memory Summary to determine which 
processes are writing to the page file excessively, and decide whether 
their working sets need adjustment. 


© Global OpenVMS [ok | cancel | Apply 
The command line parameters are automatically added when Availability 
Manager passes the command to the command processor. 
The contents of "send_message.bat" are the following: 
net send affcl7 "P4:system event: %1 %2 %3 %4" 


On the target node, AFFC17, a message similar to the following is displayed: 


Messenger Service 
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You can now apply the User Action to one node, all nodes, or a group of nodes, as 
explained in Section 7.8.2. 


7.9 Customizing Security Features 
The following sections explain how to change the following security features: 
e Passwords for groups and nodes 
e Data Analyzer passwords for OpenVMS and Windows Data Collector nodes 
e Security triplets on OpenVMS Data Collector nodes 


e Password on a Windows Data Collector node 


Note 


OpenVMS Data Collector nodes can have more than one password: each 
password is part of a security triplet. (Windows nodes allow you to have 
only one password per node.) 


7.9.1 Customizing Passwords for Groups and Nodes 


For both the Windows and OpenVMS Customization Pages at the operating 

system, group, or node level is a page similar to the one shown in Figure 7-6. 
It contains a tab labeled Security. If you select this tab on either system, the 
Availability Manager displays a page similar to the one shown in Figure 7-21. 


Figure 7-21 OpenVMS Security Customization 


x 
SS ees Security 
Collector Password Use default values | 

1DECAMDS 


Enter an 8-character Data Collector password. 


© Global OpenVMS | ok | cancel | 


Apply | Help 


The level at which you can make password changes depends on whether you 
select the Security tab at the operating system, group, or node level. 
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Changing Passwords at the Group Level 

If you monitor several groups, but the password for the nodes in one of those 
groups is different from the password for nodes in other groups, right-click the 
group you want to change, select Customize from the list, select the Security 
tab, and change the password. The new password is then used for each node that 
is a member of that group. 


Changing Passwords at the Node Level 

As a second example, to change the password of one node in a group to a 
different password than the other nodes in the group, right-click that node, select 
Customize from the list, select the Security tab, and change the password 

to one that differs from the other nodes in the group. For that node, the new 
password overrides the group password. 


In the second password example, if you want to set the password for the single 
node back to the password that the rest of the group uses, click Use default 
values. The password value for the node now comes from the group-level 
password setting. At this point, if you change the group password, all nodes 
in the group get the new password. Additional information about changing 
passwords for security is in Section 7.9. 


7.9.2 Changing Data Analyzer Passwords 


You can change the passwords that the Windows Data Analyzer uses for 
OpenVMS Data Collector nodes and for Windows Data Collector nodes. The 
following sections explain how to perform both actions. 


7.9.2.1 Changing a Data Analyzer Password for an OpenVMS Data Collector Node 
When you select Customize OpenVMS... on the Customize menu of the System 
Overview window, the Availability Manager displays a default customization 
page. On it is a tab marked Security, which, if you select it, displays the 
OpenVMS Security Customization page (Figure 7—21). 


To change the default password for the Data Analyzer to use to access OpenVMS 
Data Collector nodes, enter a password of exactly 8 uppercase alphanumeric 
characters. The Availability Manager will use this password to access OpenVMS 
Data Collector nodes. This password must match the password that is part of the 
OpenVMS Data Collector security triplet (Section 1.3.3). 


When you are satisfied with your password, click OK. Exit the Availability 
Manager, and restart the application for the password to take effect. 


7.9.2.2 Changing a Data Analyzer Password for a Windows Data Collector Node 


When you select Customize Windows NT... on the Customize menu of the 
System Overview window, the Availability Manager displays a Windows Security 
Customization page (Figure 7-22). 
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Figure 7-22 Windows Security Customization 


Customization - Windows NT Default Settings x! 
(Events |( Event Escalation | Security | 


Collector Password | Use default values | 


AvailMan 


Enter an 8-character Data Collector password. 


7] Global Windows NT ok | cancet || apply | Hel 


To change the default password for the Data Analyzer to use to access Windows 
Data Collector nodes, enter a password of exactly 8 alphanumeric characters. 
Note that this password is case sensitive; any time you type it, you must use the 
original capitalization. 


This password must also match the password for the Windows Data Collector 
node that you want to access. (See Section 7.9.3 for instructions for changing that 
password.) 


When you are satisfied with your password, click OK. Exit and restart the 
Availability Manager for the password to take effect. 


7.9.3 Changing a Password on a Windows Data Collector 
To change the Data Collector password in the Registry, follow these steps: 


1. Click the Windows Start button. On the menu displayed, first select 
Programs, then Accessories, and then Command Prompt. 


2. Type regedit after the angle prompt (>). 


The system displays a screen for the Registry Editor, with a list of entries 
under My Computer. 


On the list displayed, expand the HKEY_LOCAL_MACHINE entry. 
Double-click SYSTEM. 

Click CurrentControlSet. 

Click Services. 

Click damdrvyr. 


Click Parameters. 


oon oT Pe & 


Double-click Read Password. Then type a new 8-character alphanumeric 
password, and click OK to make the change. 
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To store the new password, click Exit under File on the main menu bar. 
On the Control Panel, select Services and then Stop for “PerfServ.” 
Again on the Control Panel, select Devices and then Stop for “damdrvr.” 


First restart damdrvr under “Devices” and then restart PerfServ under 
“Services.” 


This step completes the change of your Data Collector password. 


7.10 Monitoring Processes on a Node 


As the Availability Manager monitors all the processes on the system, you can 
configure the tool to notify you when particular processes are created or exit 
on your system. The Availability Manager can watch up to eight processes 

on an individual node or a node that is part of a group that you select on the 
Group/Node Inclusion Lists page (Figure 7-3). (You cannot, however, use this 
feature to notify you about processes that should not be there.) 


To use the Watch Process feature, right-click a node in the Node pane to display 
the Node pane Customization Menu (Figure 7-23). 


Figure 7-23 Node Customization Menu 
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On this menu, select Customize to display the options available for customizing 
individual nodes. One of these options is watch process, which you can select to 
display the Watch Process page (Figure 7—24). 
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Figure 7-24 Process Watch 


Customization - Open¥MS Default Settings ; xi 
Watch Process 


© Process watch ffi] Use default values 


Processes to Watch 


Watch Process Configuration Explanation 


You can monitor up to eight processes on this 
node. Simply type in the process name in the 
box on the left. 


After you click an OK or Apply, Availability 
Manager will query this node for the 
processes listed. If any processes are not 
present on this node, a NOPROC is displayed 
in the event table on the main page. 


Ifa process was missing and then starts up, 
a PRCFND eventis displayed in the event 
table. 


Please note that process names are 
case-sensitive. 


© Global OpenVMS [0K || cancer | apply | Hein 


An explanation of the watch process feature is displayed on the right side of the 
page. You can enter up to 8 processes in the box on the left side of the page. 
After you enter process names, the Availability Manager monitors these processes 
on the node that you have selected. 


For a process that is not present on the node at the time you entered it on the 
Watch Process page, the Availability Manager displays the following event in the 
Event pane of the System Overview window (Figure 2-1): 


NOPROC -- The process process-name has disappeared on 
the node node-name. 


If a process that was signalled by a NOPROC event reappears on the node, 
the Availability Manager displays the following event in the Event pane of the 
System Overview window (Figure 2-1): 


PRCFND -- The process process-name has recently 
reappeared on the node node-name. 
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The CPU process states shown in Table A-1 are displayed in the OpenVMS CPU 
Process States page (Figure 3-8) and in the OpenVMS Process Information page 
(Figure 3-23). 


Table A-—1 CPU Process States 


Process 

State Description 

CEF Common Event Flag, waiting for a common event flag 

COLPG Collided Page Wait, involuntary wait state; likely to indicate a memory 
shortage, waiting for hard page faults 

COM Computable; ready to execute 

COMO Computable Outswapped, COM, but swapped out 

CUR Current, currently executing in a CPU 

FPG Free Page Wait, involuntary wait state; most likely indicates a memory 
shortage 

LEF Local Event Flag, waiting for a Local Event Flag 

LEFO Local Event Flag Outswapped; LEF, but outswapped 

HIB Hibernate, voluntary wait state requested by the process; it is inactive 

HIBO Hibernate Outswapped, hibernating but swapped out 


(continued on next page) 
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Table A-1 (Cont.) CPU Process States 


Process 
State 


Description 


MWAIT 


PFW 


RWAST 


RWBRK 


RWCAP 


RWCLU 
RWCSV 


A-2 CPU Process States 


Miscellaneous Resource Wait, involuntary wait state, possibly caused by a 
shortage of a systemwide resource, such as no page or swap file capacity or 
no synchronizations for single-threaded code. 


Types of MWAIT states are shown in the following table: 


MWAIT State Definition 

BWAIT Process waiting for buffered I/O byte count 
quota. 

JWAIT Process in either BWAIT or TWAIT state. 

TWAIT Process waiting for timer queue entry quota. 

EXH Kernel thread in exit handler (not currently 
used). 

IMODE Kernel thread waiting to acquire inner-mode 
semaphore. 

PSXFR Process waiting during a POSIX fork operation. 

RWAST Process waiting for system or special kernel 
mode AST. 

RWMBX Process waiting because mailbox is full. 

RWNBX Process waiting for nonpaged dynamic memory. 

RWPFF Process waiting because page file is full. 

RWPAG Process waiting for paged dynamic memory. 

RWMPE Process waiting because modified page list is 
empty. 

RWMPB Process waiting because modified page writer is 
busy. 

RWSCS Process waiting for distributed lock manager. 

RWCLU Process waiting because OpenVMS Cluster is in 
transition. 

RWCAP Process waiting for CPU that has its capability 
set. 

RWCSV Kernel thread waiting for request completion by 


OpenVMS Cluster server process. 


Page Fault Wait, involuntary wait state; possibly indicates a memory 
shortage, waiting for hard page faults. 


Resource Wait State, waiting for delivery of an asynchronous system trap 
(AST) that signals a resource availability; usually an I/O is outstanding or a 


process quota is exhausted. 


Resource Wait for BROADCAST to finish 
Resource Wait for CPU Capability 


Resource Wait for Cluster Transition 


Resource Wait for Cluster Server Process 


(continued on next page) 


CPU Process States 


Table A-1 (Cont.) CPU Process States 


Process 

State Description 

RWIMG Resource Wait for Image Activation Lock 

RWLCK Resource Wait for Lock ID data base 

RWMBX Resource Wait on MailBox, either waiting for data in mailbox (to read) or 
waiting to place data (write) into a full mailbox (some other process has not 
read from it; mailbox is full so this process cannot write). 

RWMPB Resource Wait for Modified Page writer Busy 

RWMPE Resource Wait for Modified Page list Empty 

RWNPG Resource Wait for Non Paged Pool 

RWPAG Resource Wait for Paged Pool 

RWPFF Resource Wait for Page File Full 

RWQUO Resource Wait for Pooled Quota 

RWSCS Resource Wait for System Communications Services 

RWSWP Resource Wait for Swap File space 

SUSP Suspended, wait state process placed into suspension; it can be resumed at 
the request of an external process 

SUSPO Suspended Outswapped, suspended but swapped out 
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This appendix contains the following tables of events: 
e OpenVMS Events Table B-1 

e Windows Events Table B—2 

Each table provides the following information: 


e Alphabetical list of the events that the Availability Manager signals in the 
Event pane of the System Overview window (Figure 1-1) 


e Abbreviation and brief description of each event (also displayed in the Event 
pane) 


e Explanation of the event and a suggestion for remedial action, if applicable 


Table B—-1 OpenVMS Events 

Event Description Explanation Recommended Action 

CFGDON'- Configuration The server application has made This informational event indicates that 
done a connection to the node and will the node is recognized. No further 

start collecting the data according investigation is required. 
to the Customize Data Collection 
options. 

DPGERR Error The Data Collector has detected a This event can occur if you have a bad 
executing program error while executing the driver program library, or there is a bug 
driver data collection program. in the driver program. Make sure you 
program have the program library that shipped 

with the kit; if it is correct, contact your 
customer support representative with the 
full text of the event. 

DSKERR High disk The error count for the disk device Check error log entries for device errors. 
error count exceeds the threshold. A disk device with a high error count 

could indicate a problem with the disk 
or with the connection between the disk 
and the system. 

DSKINV __ Disk is The valid bit in the disk device Make sure that the disk device is valid 
invalid status field is not set. The disk and is known to the operating system. 


device is not considered valid by 
the operating system. 


(continued on next page) 
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Event Description Explanation Recommended Action 

DSKMNV Disk in The disk device is performing a The system is performing a mount 
mount verify mount verification. verification for the disk device. This 
state could be caused by: 


DSKOFF __ Disk device The disk device has been placed in 


is off line the off line state. 
DSKQLN High disk The average number of pending 
queue length I/Os to the disk device exceeds the 
threshold. 


DSKRWT High disk The RWAIT count on the disk 


RWAIT device exceeds the threshold. 
count 

DSKUNA _ Disk The disk device has been placed in 
device is the Unavailable state. 
unavailable 

DSKWRV_ Wrong The disk device has been mounted 
volume with the wrong volume label. 
mounted 

ELIBCR Bad CRC for The CRC calculation for the 
exportable exportable program library does 
program not match the CRC value in the 
library library. 

ELIBNP No privilege | Unable to access the exportable 
to access program library. 
exportable 
program 
library 

ELIBUR Unable Unable to read the exportable 
to read program library for the 
exportable combination of hardware 
program architecture and OpenVMS 
library version. 


B-2 Tables of Events 


e A removable disk on a local or 
remote node was removed. 


e A disk on a local or remote node has 
gone offline due to errors. 


e The node that serves the disk is 
down. 


e The connection to a remote disk is 
down. 


Check whether the disk device should 
be off line. This event is also signalled 
when the same device name is used for 
two different physical disks. The volume 
name in the event is the second node to 
use the same device name. 


More I/O requests are being queued 

to the disk device than the device can 

service. Reasons include a slow disk or 
too much work being done on the disk. 


RWAIT is an indicator that an I/O 
operation has stalled, usually during 
normal connection failure recovery 

or volume processing of host-based 
shadowing. A node has probably failed 
and shadowing is recovering data. 


The disk device state has been set to 
/NOAVAILABLE. See DCL help for the 
SET DEVICE/AVAILABLE command. 


Set the correct volume name by 
entering the DCL command SET 
VOLUME/LABEL on the node. 


The exportable program library may be 
corrupt. Restore the exportable program 
library from its original source. 


Check to make sure that the Availability 
Manager has the proper security access 
to the exportable program library file. 


The exportable program library may be 
corrupt. Restore the exportable program 
library from its original source. 
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Event Description Explanation Recommended Action 

FXCPKT Received a The Availability Manager tried This event could occur if there is network 
corrupt fix to perform a fix, but the fix congestion or some problem with the 
response acknowledgment from the node node. Confirm the connection to the 
packet from ___ was corrupt. node, and reapply the fix if necessary. 
node 

FXCRSH Crash node The Availability Manager has This informational message indicates a 
fix successfully performed a Crash successful fix. Expect to see a Path Lost 

Node fix on the node. event for the node. 

FXDCPR Decrement The Availability Manager This informational message indicates a 
process has successfully performed a successful fix. Setting a process priority 
priority fix Decrement Process Priority fix on too low takes CPU time away from the 

the process. process. 

FXDCWS _ Decrement The Availability Manager has This informational message indicates 
process successfully decreased the working a successful fix. This fix disables the 
working set set size of the process on the node automatic working set adjustment for the 
size fix by performing an Adjust Working __ process. 

Set fix. 

FXDLPR Delete The Availability Manager has This informational message indicates a 
process successfully performed a Delete successful fix. If the process is in RWAST 
fix Process fix on the process. state, this fix does not work. This fix also 

does not work on processes created with 
the no delete option. 

FXEXIT Exit image The Availability Manager has This informational message indicates a 
fix successfully performed an Exit successful fix. Forcing a system process 

Image fix on the process. to exit its current image can corrupt the 
kernel. 

FXINPR Increment The Availability Manager has This informational message indicates a 
process successfully performed an successful fix. Setting a process priority 
priority fix Increment Process Priority fix too high takes CPU time away from 

on the process. other processes. Set the priority above 
15 only for “real-time” processing. 

FXINQU Increment The Availability Manager has This informational message indicates a 
process successfully increased the quota successful fix. This fix is only for the life 
quota limits limit of the process on the node by _ of the process. If the problem continues, 
fix placing a new limit value in the change the limit for the account in the 

limit field of the quota. UAF file. 

FXINWS Increment The Availability Manager has This informational message indicates 
process successfully increased the working a successful fix. This fix disables the 
working set set size of the process on the node automatic working set adjustment for the 
size fix by performing an Adjust Working _ process. The adjusted working set value 

Set fix. cannot exceed WSQUOTA for the process 
or WSMAX for the system. 

FXNOPR  No-change The Availability Manager has This informational message indicates a 
process successfully performed a Process successful fix. The Fix Value slider was 


priority fix 


Priority fix on the process that 
resulted in no change to the 
process priority. 


set to the current priority of the process. 


(continued on next page) 
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Event Description Explanation Recommended Action 

FXNOQU No-change The Availability Manager has This informational message indicates a 
process successfully performed a quota successful fix. The Fix Value slider was 
quota limits limit fix for the process that set to the current quota of the process. 
fix resulted in no change to the quota 

limit. 

FXNOWS _No-change The Availability Manager has This informational message indicates a 
process successfully performed Adjust successful fix. The Fix Value slider was 
working set Working Set fix on the process. set to the current working set size of the 
size fix process. 

FXPGWS__— Purge The Availability Manager has This informational message indicates a 
working successfully performed a Purge successful fix. The purged process might 
set fix Working Set fix on the process. page fault to retrieve memory it needs 

for current processing. 

FXPRIV No privilege The Availability Manager cannot See Chapter 6 for details about setting 
to attempt perform a fix on the node due up security. 
fix either to no CMKRNL privilege or 

to unmatched security triplets. 

FXQUOR = Adjust The Availability Manager has This informational message indicates a 
quorum successfully performed an Adjust successful fix. Use this fix when you find 
fix Quorum fix on the node. many processes in RWCAP state on a 

cluster node. 

FXRESM Resume The Availability Manager has This informational message indicates a 
process fix successfully performed a Resume successful fix. If the process goes back 

Process fix on the process. into suspend state, check the AUDIT_ 
SERVER process for problems. 

FXSUSP Suspend The Availability Manager has This informational message indicates a 

process fix successfully performed a Suspend _ successful fix. Do not suspend system 
Process fix on the process. processes. 

FXTIMO _ Fix timeout The Availability Manager This event can occur if there is network 
tried to perform a fix, but no congestion, if some problem is causing 
acknowledgment for the fix was the node not to respond, or if the fix 
received from the node within the __request failed to reach the node. Confirm 
timeout period. the connection to the node, and reapply 

the fix if necessary. 

FXUERR Unknown The Availability Manager tried to Please contact your HP customer support 
error code perform a fix, but the fix failed for representative with the text of this 
for fix an unexpected reason. event. The event text is also recorded in 

the event log. 

HIBIOR High The node’s average buffered I/O A high buffered I/O rate can cause high 
buffered rate exceeds the threshold. system overhead. If this is affecting 
I/O rate overall system performance, use the I/O 

Summary to determine the high buffered 
I/O processes, and adjust their priorities 
or suspend them as needed. 

HICOMQ Many The average number of processes Use the CPU Mode Summary to 
processes on the node in the COM or COMO determine which processes are competing 
waiting in queues exceeds the threshold. for CPU resources. Possible adjustments 
COM or include changing process priorities and 
COMO suspending processes. 


B-4 Tables of Events 
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Event Description Explanation Recommended Action 

HIDIOR High direct The average direct I/O rate on the A high direct I/O rate can cause high 
I/O rate node exceeds the threshold. system overhead. If this is affecting 

overall system performance, use the I/O 
Summary to determine the high direct 
I/O processes, and adjust their priorities 
or suspend them as needed. 

HIHRDP = _— High hard The average hard page fault rate A high hard page fault indicates that the 
page fault on the node exceeds the threshold. free or modified page list is too small. 
rate Check Chapter 6 for possible actions. 

HIMWTQ Many The average number of processes Use the CPU and Single Process pages 
processes on the node in the Miscellaneous to determine which resource is awaited. 
waiting in Resource Wait (MWAIT) queues See Chapter 6 for more information 
MWAIT exceeds the threshold. about wait states. 

HINTER High The average percentage of time Consistently high interrupt time 
interrupt the node spends in interrupt mode prohibits processes from obtaining CPU 
mode time exceeds the threshold. time. Determine which device or devices 

are overusing this mode. 

HIPINT High The average percentage of time Consistently high interrupt time on the 
interrupt the node spends in interrupt mode Primary CPU can slow down IO and 
mode time exceeds the threshold. servicing various systems in OpenVMS. 
on Primary Enabling Fast Path helps distribute the 
CPU servicing of interrupts from IO among 

the CPUs on the node. Also, determine 
which device or devices are overusing 
this mode. 

HIPRCT High process The proportion of actual processes Decrease the number of actual 
count to maximum processes is processes. Increase SYSGEN parameter 

too high. If the number of MAXPROCESSCNT. 
processes reaches the maximum 

(MAXPROCESSCNT), no more 

processes can be created and the 

system might hang as a result. 

HIPWIO High paging The average paging write I/O rate Use the Process I/O and Memory 
write I/O on the node exceeds the threshold. Summary pages to determine which 
rate processes are writing to the page file 

excessively, and decide whether their 
working sets need adjustment. 

HIPWTQ Many The average number of processes Use the CPU Process States and Memory 
processes on the node that are waiting Summary to determine which processes 
waiting in for page file space exceeds the are in the COLPG, PFW, or FPG state. 
COLPG, threshold. COLPG and PFW processes might be 
PFW, or FPG constrained by too little physical memory, 

too restrictive working set quotas, or 
lack of available page file space. FPG 
processes indicate too little physical 
memory is available. 

HISYSP High system The node’s average page fault rate These are page faults from pageable 


page fault 
rate 


for pageable system areas exceeds 
the threshold. 


sections in loadable executive images, 
page pool, and the global page table. The 
system parameter SYSMWCNT might 
be set too low. Use AUTOGEN to adjust 
this parameter. 
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Event Description Explanation 


Recommended Action 


HITTLP High total The average total page fault rate 


page fault on the node exceeds the threshold. 
rate 

HMPSYN High mul- The average percentage of time 
tiprocessor the node handles multiprocessor 
(MP) syn- (MP) synchronization exceeds the 
chronization threshold. 
mode time 

HPMPSN- High MP The average percentage of time 
synchro- the node handles multiprocessor 
nization (MP) synchronization exceeds the 
mode time threshold. 
on Primary 
CPU 

KTHIMD Kernel The average percentage of time 
thread that the kernel thread waits for 
waiting for the inner-mode semaphore exceeds 
inner-mode the threshold. 
semaphore 

LCKBLK Lock The process holds the highest 
blocking priority lock in the resource’s 


granted lock queue. This lock 
is blocking all other locks from 
gaining access to the resource. 


LCKCNT Lock The resource has a contention 
contention situation, with multiple locks 
competing for the same resource. 
The competing locks are the 
currently granted lock and those 
that are waiting in the conversion 
queue or in the waiting queue. 


LCKWAT Lock waiting The process that has access to the 
resource is blocking the process 
that is waiting for it. Once the 
blocking process releases its 
access, the next highest lock 
request acquires the blocking lock. 


LOASTQ Process has Either the remaining number 


used most of asynchronous system traps 
of ASTLM (ASTs) the process can request 
quota is below the threshold, or the 


percentage of ASTs used compared 
to the allowed quota is above the 
threshold. 
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Use the Memory Summary to find the 
page faulting processes, and make sure 
that their working sets are set properly. 


High synchronization time prevents 
other devices and processes from 
obtaining CPU time. Determine which 
device is overusing this mode. 


High synchronization time prevents 
other devices and processes from 
obtaining CPU time. This is especially 
critical for the Primary CPU, which is 
the only CPU that performs certain tasks 
on OpenVMS. Determine which spinlocks 
are overusing this mode. Executing 
SYS$EXAMPLES:SPL.COM shows 
which spinlocks are being used. 


Use SDA to determine which kernel 
thread of the process has the semaphore. 


Use the Single Process Windows to 
determine what the process is doing. If 
the process is in an RWxxx state, try 
exiting the image or deleting the process. 
If this fails, crashing the blocking node 
might be the only other fix option. 


Use Lock Contention to investigate a 
potential lock contention situation. Locks 
for the same resource might have the 
NODLCKWT wait flag enabled and be 
on every member of the cluster. Usually 
this is not a lock contention situation, 
and these locks can be filtered out. 


If the blocking process holds the resource 
too long, check to see whether the process 
is working correctly; if not, one of the 
fixes might solve the problem. 


If the amount used reaches the quota, 
the process enters RWAST state. If 

the process requires a higher quota, 

you can increase the ASTLM quota for 
the process in the UAF file. ASTLM is 
only a count; system resources are not 
compromised by increasing this count. 
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LOBIOQ _ Process has Either the remaining number of If the amount used reaches the quota, 
used most Buffered I/Os (BIO) the process the process enters RWAST state. If the 
of BIOLM can request is below the threshold, process requires a higher quota, you 
quota or the percentage of BIOs used is can increasing the BIOLM quota for 

above the threshold. the process in the UAF file. BIOLM is 
only a count; system resources are not 
compromised by increasing this count. 

LOBYTQ Process has Either the remaining number of If the amount used reaches the quota, 
used most bytes for the buffered I/O byte the process enters RWAST state. If the 
of BYTLM count (BYTCNT) that the process process requires a higher quota, you can 
quota can request is below the threshold, raise the BYTLM quota for the process in 

or the percentage of bytes used is the UAF file. BYTLM is the number of 
above the threshold. bytes in nonpaged pool used for buffered 
YO. 

LODIOQ Process has Either the remaining number of If the amount used reaches the quota, 
used most Direct I/Os (DIOs) the process can __ the process enters RWAST state. If 
of DIOLM request is below the threshold, or — the process requires a higher quota, 
quota the percentage of DIOs used is you can increase the DIOLM quota for 

above the threshold. the process in the UAF file. DIOLM is 
only a count; system resources are not 
compromised by increasing this count. 

LOENQU Process has Either the remaining number of If the limit reaches the quota, the process 
used most lock enqueues (ENQ) the process is not able to make further lock queue 
of ENQLM can request is below the threshold, requests. If the process requires a higher 
quota or the percentage of ENQs used is quota, you can increase the ENQLM 

above the threshold. quota for the process in the UAF file. 

LOFILQ Process has Either the remaining number of If the amount used reaches the quota, 
used most files the process can open is below _ the process must first close some files 
of FILLM the threshold, or the percentage of before being allowed to open new ones. If 
quota files open is above the threshold. the process requires a higher quota, you 

can increase the FILLM quota for the 
process in the UAF file. 

LOMEMY Free memory For the node, the percentage of Use the automatic Purge Working Set fix, 
is low free memory compared to total or use the Memory and CPU Summary to 

memory is below the threshold. select processes that that are either not 
currently executing or not page faulting, 
and purge their working sets. 

LOPGFQ Process has Either the remaining number of If the process requires a higher quota, 
used most of _ pages the process can allocate you can raise the PGFLQUOTA quota for 
PGFLQUOTA from the system page file is below __ the process in the UAF file. This value 
quota the threshold, or the percentage limits the number of pages in the system 

of pages allocated is above the page file that the account’s processes can 
threshold. use. 

LOPGSP _ Low page file Either the remaining number Either extend the size of this page file 


space 


of pages in the system page file 
is below the threshold, or the 
percentage of page file space 
remaining is below the threshold. 


or create a new page file to allow new 
processes to use the new page file. 
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LOPRCQ Process has Either the remaining number of If the amount used reaches the quota, 
used most subprocesses the current process the process is not allowed to create more 
of PRCLM is allowed to create is below the subprocesses. If the process requires 
quota threshold, or the percentage of a higher quota, you can increase the 

created subprocesses is above the PRCLM quota for the process in the UAF 
threshold. file. 

LOSTVC Lost virtual The virtual circuit between the Check to see whether the second 
circuit to listed nodes has been lost. node listed has failed or whether the 
node connection between the nodes is broken. 

The VC name listed in parentheses is the 
communication link between the nodes. 

LOSWSP Low swap Either the remaining number Hither increase the size of this page file, 
file space of pages in the system page file or create a new page file to allow new 

is below the threshold, or the processes to use the new page file. 
percentage of page file space 
remaining is below the threshold. 

LOTQEQ ___— Process has Either the remaining number of If the amount used reaches the quota, 
used most Timer Queue Entries (TQEs) the the process enters RWAST state. If the 
of TQELM process can request is below the process requires a higher quota, you can 
quota threshold, or the percentage of raise the TQELM quota for the process 

TQEs used to the allowed quota is in the UAF file. TQELM is only a count; 
above the threshold. system resources are not compromised by 
raising it. 

LOVLSP Low disk Either the remaining number of You must free up some disk volume 
volume free blocks on the volume is below space. If part of the purpose of the 
space the threshold, or the percentage volume is to be filled, such as a 

of free blocks remaining on the page/swap device, then you can filter 
volume is below the threshold. the volume from the display. 

LOVOTE _ Low cluster The difference between the Check to see whether voting members 
votes number of VOTES and the have failed. To avoid the hang that 

QUORUM in the cluster is below results if VOTES goes below QUORUM, 
the threshold. use the Adjust Quorum fix. 

LOWEXT Low process _ The process page fault rate This event indicates that the 
working set exceeds the threshold, and the WSEXTENT value in the UAF file 
extent percentage of working set size might be too low. The process needs 

compared to working set extent more physical memory but cannot obtain 
exceeds the threshold. it; therefore, the process page faults 
excessively. 

LOWSQU_ Low process __ The process page fault rate This event indicates the process needs 


working set 
quota 
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exceeds the threshold, and the 
percentage of working set size 
exceeds the threshold. 


more memory but might not be able to 
obtain it because one of the following is 
true: 


e The WSQUOTA value in the UAF 
file is set too low for the size of 
memory allocation requests or 


e The system is memory constrained. 
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LRGHSH Remote lock The Availability Manager cannot This event indicates that the Availability 
hash table investigate the node’s resource Manager will take too many collection 
too large to hash table (RESHASHTBL). It is iterations to analyze lock contention 
collect data either too sparse or too dense to situations efficiently. Make sure that the 
on investigate efficiently. SYSGEN parameter RESHASHTBL is 

set properly for the node. 

NOPGFL No page file The Availability Manager cannot Use SYSGEN to create and connect a 

find a page file on the node. page file on the node. 

NOPLIB No program The program library for the Check to see that all the program 
library combination of hardware library files exist in the program library 

architecture and OpenVMS directory. 
version was not found. 

NOPRIV = Not allowed The Availability Manager See Chapter 6 for details on setting up 
to monitor cannot monitor the node due to security. 
node unmatched security triplets. 

NOPROC _ Specific The Availability Manager cannot This event can occur because the listed 
process not find the process name selected in process no longer exists, or the process 
found the Process Name Search dialog name is listed incorrectly in the dialog 

box on the Node Summary page. box. 

NOSWFL Noswap file The Availability Manager cannot If you do not use swap files, you can 

find a swap file on the node. ignore this event. Otherwise, use 
SYSGEN to create and connect a swap 
file for the node. 

OPCERR ~ Event not Either the Availability Manager A text message in the status field 
sent to was unable to send the event indicates that the Availability Manager 
OPCOM to OPCOM because of a setup was not configured properly, including 


problem or an error was returned 


by OPCOM. 


missing shareable images or incorrectly 
defined logical names. 


A hexadecimal condition value in the 
status field indicates the reason that 
OPCOM was not able to post the event. 
The $SNDOPR system service returns 
this value. For a list of condition values 
and additional information, see the HP 
OpenVMS System Services Reference 
Manual. 
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OVOERR _ Event not The Availability Manager was The reason is stated in the event 
sent to unable to send the event to description in the Event pane. Problems 
OpenView OpenView. can include the following: 

e The Availability Manager was 
not configured properly, including 
missing shareable images or 
incorrectly defined logical names. 

e An HP OpenView policy or template 
might not have been deployed 
properly. 

e A problem occurred communicating 
to or within OpenView. 

e The user does not have sufficient 
privileges or quotas, or both. 

e Too many events are waiting to be 
escalated by OpenView. 

PKTFER Packet The data packet sent to the remote Please contact your HP customer support 


format error node was not in the correct format 
for the remote node to process. 


PLIBNP No privilege | Unable to access the program 


to access library. 
program 
library 

PLIBUR Unable Unable to read the program 
to read library for the combination 
program of hardware architecture and 
library OpenVMS version. 


PRBIOR High process The average buffered I/O rate of 
buffered I/O the process exceeds the threshold. 


rate 
PRBIOW _ Process The average percentage of time 
waiting for the process is waiting for a 


buffered I/O buffered I/O to complete exceeds 
the threshold. 


PRCCOM Process The average number of processes 
waiting in on the node in the COM or COMO 
COM or queues exceeds the threshold. 
COMO 
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representative with the full text of the 
event, the version of the Availability 
Manager, the configuration of the node 
running the Availability Manager, and 
the configuration of the nodes being 
monitored. 


Check to see that the Availability 
Manager has the proper security access 
to the program library file. 


The program library is either corrupt 
or from a different version of the 
Availability Manager. Restore 

the program library from the last 
installation. 


If the buffered I/O rate is affecting 
overall system performance, lowering 
the process priority or suspending the 
process would allow other processes to 
obtain access to the CPU. 


Use SDA on the node to ensure that the 

device to which the process is performing 
buffered I/Os is still available and is not 

being overused. 


Use the CPU Summary to determine 
which processes should be given more 
CPU time, and adjust process priorities 
and states accordingly. 
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PRCCUR Process has The average percentage of time Make sure that the listed process is not 
a high CPU the process is currently executing looping or preventing other processes 
rate in the CPU exceeds the threshold. from gaining access to the CPU. Adjust 

process priority or state as needed. 

PRCFND _ Process has The Availability Manager has No action required. 
recently been discovered the process name 
found selected on the Watch Process 

page (see Figure 7-24). 

PRCMUT Process The average percentage of time Use SDA to help determine which mutex 
waiting fora __ the process is waiting for a the process is waiting for and to help 
mutex particular system mutex exceeds determine the owner of the mutex. 

the threshold. 

PRCMWT Process The average percentage of time Various resource wait states are part of 
waiting in the process is in a Miscellaneous the collective wait state called MWAIT. 
MWAIT Resource Wait (MWAIT) state See Appendix A for a list of these states. 

exceeds the threshold. The CPU Process page and the Single 
Process page display which state the 
process is in. Check the Single Process 
page to determine which resource the 
process is waiting for and whether the 
resource is still available for the process. 

PRCPSX Process The average percentage of time 
waiting in the process waits during a 
PSXFR POSIX fork operation exceeds 

the threshold. 

PRCPUL Most of The remaining CPU time available Make sure the CPU time allowed for the 
CPULIM for the process is below the process is sufficient for its processing 
process threshold. needs. If not, increase the CPU quota in 
quota used the UAF file of the node. 

PRCPWT Process The average percentage of time Check to make sure the system page 
waiting in the process is waiting to access file is large enough for all the resource 
COLPG, the system page file database requests being made. 

PFW or FPG _ exceeds the threshold. 

PRCQUO Process The average percentage of time Use the Single Process pages to 
waiting fora __ the process is waiting for a determine which quota is too low. Then 
quota particular quota exceeds the adjust the quotas of the account in the 

threshold. UAF file. 

PRCRWA __ Process The average percentage of Use the Single Process pages to 
waiting in time the process is waiting in determine if RWAST is due to the process 
RWAST the RWAST state exceeds the quota being set too low. If not, use 

threshold. RWAST indicates SDA to determine if RWAST is due to 
the process is waiting for an a problem between the process and a 
asynchronous system trap to physical device. 

complete. 

PRCRWC _ Process The average percentage of When many processes are in this state, 
waiting in time the process is waiting in the system might be hung because not 
RWCAP the RWCAP state exceeds the enough nodes are running in the cluster 


threshold. RWCAP indicates that 
the process is waiting for CPU 
capability. 


to maintain the cluster quorum. Use 
the Adjust Quorum fix to correct the 
problem. 
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PRCRWM Process The average percentage of time Use SDA to help determine which 
waiting in the process is waiting in the mailbox the process is waiting for. 
RWMBX RWMBxX state exceeds the 

threshold. RWMBxX indicates 
the process is waiting for a full 
mailbox to be empty. 

PRCRWP Process The average percentage of time Processes in the RWPAG or RWNPG 
waiting in the process is waiting in the state can indicate you need to increase 
RWPAG, RWPAG, RWNPG, RWMPE, the size of paged or nonpaged pool, 
RWNPG, or RWMPB state exceeds the respectively. Processes in the RWMPB 
RWMPE, or threshold. RWPAG and RWNPG state indicate that the modified page 
RWMPB are for paged or nonpaged pool; writer cannot handle all the modified 

RWMPE and RWMPPB are for the pages being generated. See Chapter 6 for 
modified page list. suggestions. 

PRCRWS Process The average percentage of time Use the Show Cluster utility to help 
waiting in the process is waiting in the investigate. 

RWSCS, RWSCS, RWCLU, or RWCSV state 

RWCLU, or exceeds the threshold. RWCSV is 

RWCSV for the cluster server; RWCLU is 
for the cluster transition; RWSCS 
is for cluster communications. The 
process is waiting for a cluster 
event to complete. 

PRCUNK Process The average percentage of time The state in which the process is waiting 
waiting for the process is waiting for an is unknown to the Availability Manager. 
a system undetermined system resource 
resource exceeds the threshold. 

PRDIOR ~~ High process’ The average direct I/O rate of the _If the I/O rate is affecting overall system 
direct I/O process exceeds the threshold. performance, lowering the process 
rate priority might allow other processes 

to obtain access to the CPU. 

PRDIOW _ Process The average percentage of time Use SDA on the node to ensure that the 
waiting for the process is waiting for a device to which the process is performing 
direct I/O direct I/O to complete exceeds direct I/Os is still available and is not 

the threshold. being overused. 

PRLCKW Process The average percentage of time The control wait state indicates that a 
waiting fora __ the process is waiting in the process is waiting for a lock. Although no 
lock control wait state exceeds the locks might appear in Lock Contention, 

threshold. the awaited lock might be filtered out of 
the display. 

PRPGFL High process’ The average page fault rate of the The process is memory constrained; it 


page fault 
rate 
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process exceeds the threshold. 


needs an increased number of pages 
to perform well. Make sure that the 
working set quotas and extents are 
set correctly. To increase the working 
set quota temporarily, use the Adjust 
Working Set fix. 
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PRPIOR High process The average page read I/O rate of 
paging I/O the process exceeds the threshold. 


rate 

PTHLST Path lost The connection between the server 

and collection node has been lost. 

RESDNS _ Resource The percentage of occupied entries 
hash table in the hash table exceeds the 
dense threshold. 

RESPRS Resource The percentage of occupied entries 
hash table in the hash table is less than the 
sparse threshold. 

UEXPLB Using The program library for the 
OpenVMS combination of hardware 
program architecture and OpenVMS 
export version was not found. 
library 

UNSUPP Unsupported The Availability Manager does 
node not support this combination 


of hardware architecture and 
OpenVMS version. 


VLSZCH Volume size Informational message to indicate 


changed that the volume has been resized. 
WINTRN'~ High window This indicates that current open 
turn rate files are fragmented. Reading 


from fragmented files or extending 
a file size, or both, can cause a 
high window turn rate. 


The process needs an increased number 
of pages to perform well. Make sure that 
the working set quotas and extents are 
set correctly. To increase the working 
set quota temporarily, use the Adjust 
Working Set fix. 


Check to see whether the node failed or 
whether the LAN segment to the node is 
having problems. This event occurs when 
the server no longer receives data from 
the node on which data is being collected. 


A densely populated table can result 
in a performance degradation. Use the 
system parameter RESHASHTBL to 
adjust the total number of entries. 


A sparsely populated table wastes 
memory resources. Use the system 
parameter RESHASHTBL to adjust the 
total number of entries. 


Check to see that all the program 
library files exist in the program library 
directory. 


Check the product SPD for supported 
system configurations. 


No further investigation is required. 


Defragment heavily used volumes using 
BACKUP or a disk fragmentation 
program. For processes that extend 

the size of a file, make sure that the 
file extent value is large. (See the $SET 
RMS/EXTEND_QUANTITY command 
documentation for more information.) 


Table B—2 Windows Events 


Event Description Explanation 


Recommended Action 


CFGDON - Configuration The server application has made 
done a connection to the node and will 

start collecting the data according 
to the Customize Data Collection 


options. 
NODATA Unable The Availability Manager is 
to collect unable to collect performance 
performance data from the node. 


data 


An informational event to indicate that 
the node is recognized. No further 
investigation is required. 


The performance data is collected by the 
PerfServ service on the remote node. 
Check to see that the service is up and 
running properly. 
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NOPRIV Not allowed 
to monitor 
node 


PTHLST Path lost 


PVRMIS Packet 
version 
mismatch 


The Availability Manager cannot 
monitor the node due to a 
password mismatch between 

the Data Collector and the Data 
Analyzer. 


The connection between the Data 
Analyzer and the Data Collector 
has been lost. 


This version of the Availability 
Manager is unable to collect 
performance data from the node 
because of a data packet version 
mismatch. 


See Chapter 6 for details on setting up 
security. 


Check if the node crashed or if the LAN 
segment to the node is having problems. 
This event occurs when the server no 
longer receives data from the node on 
which data is being collected. 


The version of the Availability Manager 
Data Collector is more recent than the 
Data Analyzer. To process data from 
the node, upgrade the Data Analyzer to 
correspond to the Data Collector. 
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OpenVMS Events by Types of Data Collections 


This appendix shows the events that can be signaled for each type of OpenVMS 
data collected. The events are categorized as follows: 


e Threshold events (Table C-1) 


e Nonthreshold events (Table C—2) 


Appendix B describes these events in detail and provides recommended actions. 


Note 


Enabling the data collections described in these tables is described in 
Chapter 7. The only exceptions are the events listed under “Process 
name scan” in Table C—1, which are enabled on the Watch Process 
Customization page (see Figure 7—24). 


Table C-1 OpenVMS Threshold Events 


Types of Data 


Collection Event Description 

Disk status DSKERR High disk error count 
DSKINV Disk is invalid 
DSKMNV Disk in mount verify state 
DSKMTO Disk mount verify timeout 
DSKOFF Disk device is off line 
DSKRWT High disk RWAIT count 
DSKUNA Disk device is unavailable 
DSKWRV Wrong volume mounted 
WINTRN High window turn rate 

Disk volume DSKQLN High disk queue length 
LOVLSP Low disk volume free space 
VLSZCH Volume size changed 

Node summary HIBIOR High buffered I/O rate 
HICOMQ Many processes waiting in COM or COMO 
HIDIOR High direct I/O rate 
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Types of Data 


Collection Event Description 
HIHRDP High hard page fault rate 
HIMWTQ Many processes waiting in MWAIT 
HINTER High interrupt mode time 
HIPINT High interrupt mode time on Primary CPU 
HIPRCT High process count 
HIPWIO High paging write I/O rate 
HIPWTQ Many processes waiting in COLPG, PFW, or FPG 
HISYSP High system page fault rate 
HITTLP High total page fault rate 
HMPSYN High multiprocessor (MP) synchronization mode 

time 

HPMPSN High interrupt mode time on Primary CPU 
LOMEMY Free memory is low 

Lock contention LCKCNT Lock contention 
LRGHSH Remote lock hash table too large to collect data 
RESDNS Resource hash table dense 
RESPRS Resource hash table sparse 

Single lock LCKBLK Lock blocking 
LCKWAT Lock waiting 

Single process KTHIMD Kernel thread waiting for inner-mode semaphore 
LOASTQ Process has used most of ASTLM quota 
LOBIOQ Process has used most of BIOLM quota 
LOBYTQ Process has used most of BYTLM quota 
LODIOQ Process has used most of DIOLM quota 
LOENQU Process has used most of ENQLM quota 
LOFILQ Process has used most of FILLM quota 
LOPGFQ Process has used most of PGFLQUOTA quota 
LOPRCQ Process has used most of PRCLM quota 
LOTQEQ Process has used most of TQELM quota 
LOWEXT Low process working set extent 
LOWSQU Low process working set quota 
PRBIOR High process buffered I/O rate 
PRBIOW Process waiting for buffered I/O 
PRCCOM Process waiting in COM or COMO 
PRCCUR Process has a high CPU rate 
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Types of Data 


Collection Event Description 
PRCMUT Process waiting for a mutex 
PRCPSX Process waiting in PSXFR wait state 
PRCPUL Most of CPULIM process quota used 
PRCPWT Process waiting in COLPG, PFW, or FPG 
PRCQUO Process waiting for a quota 
PRCRWA Process waiting in RWAST 
PRCRWC Process waiting in RWCAP 
PRCRWM Process waiting in RWMBX 
PRCRWP Process waiting in RWPAG, RWNPG, RWMPE, or 
RWMPB 
PRCRWS Process waiting in RWSCS, RWCLU, or RWCSV 
PRCUNK Process waiting for a system resource 
PRDIOR High process direct I/O rate 
PRDIOW Process waiting for direct I/O 
PRLCKW Process waiting for a lock 
PRPGFL High process page fault rate 
PRPIOR High process paging I/O rate 
Process I/O LOBIOQ Process has used most of BIOLM quota 
LOBYTQ Process has used most of BYTLM quota 
LODIOQ Process has used most of DIOLM quota 
LOFILQ Process has used most of FILLM quota 
PRBIOR High process buffered I/O rate 
PRDIOR High process direct I/O rate 
PRPIOR High process paging I/O rate 
Page/swap file LOPGSP Low page file space 
LOSWSP Low swap file space 
NOPGFL No page file 
NOSWFL No swap file 
Cluster summary LOVOTE Low cluster votes 
Memory LOWEXT Low process working set extent 
LOWSQU Low process working set quota 
PRPGFL High process page fault rate 
PRPIOR High process paging I/O rate 
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Types of Data 


Collection Event Description 
CPU process PRCCOM Process waiting in COM or COMO 

PRCCUR Process has a high CPU rate 

PRCMWT Process waiting in MWAIT (See Appendix A for a 

breakdown of MWAIT state.) 

PRCPWT Process waiting in COLPG, PFW, or FPG 
Process name scan NOPROC Specific process not found 

PRCFND Process has been discovered recently 


Table C-2 OpenVMS Nonthreshold Events 


Type of Data 


Collected Event Description 
Node-level event CFGDON Configuration done 
DPGERR Error executing driver program 
NOPRIV Not allowed to monitor node 
PKTFER Packet format error 
PTHLST Path lost 
Program library ELIBCR Bad CRC for exportable program library 
error 
ELIBNP No privilege to access exportable program library 
ELIBUR Unable to read exportable program library 
NOPLIB No program library 
PLIBNP No privilege to access program library 
PLIBUR Unable to read program library 
UEXPLB Using exportable program library 
UNSUPP Unsupported node 
Events FXCPKT Received a corrupt fix response packet from node 
generated by 
fixes 
FXCRSH Crash node fix 
FXDCPR Decrement process priority fix 
FXDCWS Decrement process working set size fix 
FXDLPR Delete process fix 
FXEXIT Exit image fix 
FXINPR Increment process priority fix 
FXINQU Increment process quota limits fix 
FXINWS Increment process working set size fix 
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OpenVMS Events by Types of Data Collections 


Table C-2 (Cont.) OpenVMS Nonthreshold Events 


Type of Data 


Collected Event Description 
FXNOPR No parameter change with fix to priority 
FXNOQU No quota change with fix to priority 
FXNOWS No working set change with fix to priority 
FXPGWS Purge working set fix 
FXPRIV No privilege to attempt fix 
FXQUOR Adjust quorum fix 
FXRESM Resume process fix 
FXSUSP Suspend process fix 
FXTIMO Fix timeout 
FXUERR Unknown error code for fix 


OpenVMS Events by Types of Data Collections C-5 


A 


Access control lists (ACLs), 1-7 

Adjust AST Queue Limit fix, 6-18 

Adjust Quorum fix, 6—6 

Adjust Working Set fix, 6-15 
AMDS$AM_CONFIG logical name, 1-7 
AMDS$AM_LOG:ANALYZEREVENTS.LOG file 
on OpenVMS systems, 5-5 
AMDS$AM_LOGICALS.COM file, 7-10 
AMDS$CONSOLE_ACCESS.DAT file, 1-9 
AMDS$DRIVER_ACCESS.DAT file, 1-7 
AMDS$GROUP_NAME logical name, 7—10 
AMDS$SYSTARTUP_VMS.COM file, 2-3 
AnalyzerEvents.log file 

on Windows systems, 5-5 
APCs (asynchronous procedure calls), 3-8 
ASTLM (AST limit) quota, B-6 
Asynchronous procedure calls (APCs), 3-8 
Asynchronous system traps (ASTs) 

Adjust AST Queue Limit fix, 6-18 
Automatic data collection, 1-13 
AVAIL/AVAIL command 

to start Availability Manager, 2-4 
Availability Manager 

URL, 2-1 
Availability messages 

sent to Data Analyzer, 1-10 


Background data collection, 1-12 
Blocks 
in use, remaining, 7-17 
Bridging information for routers, 1-6 
Buffered I/O 
byte limit (BYTLM), 3-18 
limit, 3-18 
rate, 3-17, 7-19, B-4 
Buffered I/O (BIO) fix, 6-17 
Byte limit remaining for process I/O, 3-18 


Index 


Cc 


Channels 
definition, 4-13 
details, 4-23 
LAN virtual circuit, 4-33, 4-34 
summary data, 4-13 
Circuits 
with individual nodes, 4—2 
Cluster hung fix, 6-3 
Cluster interconnects 
fixes, 6-26, 6-27 
Clusters 
See OpenVMS Clusters 


Collecting data 


See Data collection 
Collection intervals, 1-11, 3-1 
Command procedures 
user action, 7—29 
Commands 
user action, 7—29 
Configuration, 1-3 
Congestion control 
transmitting data, 4-32 
Connection failed state, 2—4 
CPU modes 
OpenVMS, 3-8, 3-9 
Windows, 3-7 
CPU process states, A-1 
OpenVMS, 3-9 
CPUs (central processing units) 
active 
number active on a node, 3-6 
configured 
number configured to run on a node, 3-6 
improving performance by suspending, 6-11 
modes, 3-7 
summary information, 3-7 
process states, A-1 
process summary, 3-11 
setting process priorities, 6-13 
usage, 3-7 
wait state, 3-11 
Crash Node fix, 6-7 
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Customizing 


access codes, 1-9 
events, 5-3, 7-23 
levels of, 7—4 
OpenVMS 
data collection, 7-11 
data filters, 7-14 
events, 7-27 
group membership, 7-10 
security features, 7-33 
security features, 7-33 
security triplets, 1-8 
Windows 
events, 7-27 
group membership, 7-10 
security features, 7-33 


Data Analyzer 


description, 1-3 

events log files, 5-5 

nodes, 1-3 

passwords, 1-6 

security, 1-6 

starting on OpenVMS Alpha, 2-3 
starting on Windows, 2-4 


Data collection 


automatic, 1-13 
background, 1-12 
changing collection intervals, 7-13 
customization 

selecting data to collect, 2-11 
customizing settings, 7-11 
default, 2-11 
definition of one, 1-12 
events associated with, 1-12 
foreground, 1-13 
frequency of, 1-14 
intervals, 1-14 
specifying types, 2-10, 2-11 
state, 2-4 


Data Collector 


for DECamds and Availability Manager, 2-1 
installing from latest kit, 2-3 
nodes, 1-3 
restarting, 7-10 
RMDRIVER, 4-1 
security 
OPCOM log, 1-7 
private LAN transport, 1-5 
security triplets, 1-6 


Data filters 


changing values, 7-14 


Data packets 


receipt, 4-31 
transmission, 4-30 
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DECamds, 1-1 
changes and enhancements 
no installation of server, 2-1 
Deferred procedure calls (DPCs), 3-8 
Delete Process fix, 6—9 
DIOLM (Direct I/O limit), 3-18 
Direct I/O fix, 6-16 
Direct I/O rate, 3-17 
Disk status 
filtering data, 7-15 
Disks 
OpenVMS 
single disk summary, 3-21 
status summary, 3-20 
summaries, 3-19 
volume summary, 3-22 
Windows 
logical summary, 3-24 
physical summary, 3-25 
Display data collection interval, 1-14 


E 


ECS 
criteria, 4-27 
Equivalent Channel Set 
See ECS 
Escalation of events, 7—28 
Event data collection interval, 1-14 
Event escalation, 7—23 
Event pane, 2-12, 5-1 
Events 
definition, 1-12 
displaying information, 5-1, 5-6 
escalation, 7—23 
log files, 5-5 
occurrence value, 7—28 
OpenVMS, B-1 
posting, 1-14 
severity, 5-2, 7-28 
signaling performance problems, 1-11 
testing for, 5-38 
threshold defaults, 4-3 
thresholds, 7—28 
thresholds for posting, 1-15 
user actions, 7—28 
Windows, B—-13 
Exit Image fix, 6-10 
Extended LAN 
communication between Data Collector and 
Data Analyzer, 1-3 


Fixes (cont'd) 


F suspending a process, 6-11 
system service calls associated with, 6-1, 6-4 

File protection types, 6-1 

for security, 1-7 Foreground data collection, 1-13 
Filtering data 

methods, 7—1 
Filters G 

OpenVMS CPU, 7-15 Galaxy ID, 3-6 

OpenVMS disk status, 7-15 Graphical user interface 

OpenVMS disk volume, 7-17 See GUI 

OpenVMS I/O, 7-18 Group/Node pane, 2-6 


OpenVMS lock contention, 7—19 

OpenVMS memory, 7-20 

OpenVMS page/swap file, 7-21 

specifying types of data to collect, 7-14 
Fixes 

adjusting AST queue limit, 6-18 

adjusting buffered I/O count limit, 6-17 

adjusting creation limit of subprocess, 6-22 

adjusting direct I/O count limit, 6-16 H 

adjusting I/O byte limit, 6-23 

adjusting lock queue limit, 6-20 

adjusting open file limit, 6-19 

adjusting pagefile quota limit, 6—24 

adjusting quorum, 6-6 

adjusting resource limits, 6-16 

adjusting time queue entry limit, 6-21 : 

adjusting working set size, 6—15 configuring on your system, 7-25 

changing process priority, 6-13 signalling events to, 7-23 

CMKRNL privilege required, 6—4 using on your system, 7—26 

crashing a node, 6-7 

deleting a process, 6-9 | 


description, 6-1 , 
exiting an image, 6-10 VO (input/output) 

LAN checksumming, 6-28 adjusting AST limits fix, 6-16 

LAN device current, threshold, and peak values, 3-17 
default data collection, 7-13 

page fault rate, 3-14 


See also Groups, Nodes 
Groups 
See also Group/Node pane 
changing, 7-11 
displaying, 2-6 
GUI (graphical user interface), 1-1 


Hardware 
security triplet address, 1-8 
Hardware model, 3-6 
Help 
See Online help 
HP OpenView 


adjusting priority, 6-37 
setting maximum buffer, 6-38 
starting device, 6-39 page/swap files, 3-18 
stopping device, 6—40 PEOcere aa Qtas,. “F=30 
LAN path rates per process, 3-18 
summary, 3-16 
V/O byte fix, 6-23 
Icons 
colors represent states, 3-2 
IEEE 802.3 Extended Packet format protocol, 1-38 
Increasing resource limits fix, 6-16 


changing hops, 6-35 
LAN path (channel) 
adjusting priority, 6-34 
LAN virtual circuit 
adjusting maximum receive window size, 


6-30 
adjusting maximum transmit window size, Interrupts per second, 3-8 
6-29 Intruder fix, 6-38 


compression, 6-31 IPID (internal PID), 3-10 


ECS maximum delay, 6-32 
list of available, 6-1 J 
memory usage, 6—2, 6—4 
problems and recommended fixes, 6-3 dell i 
purging a working set, 6-14 
results, 6—4 
resuming a process, 6—12 


Job quotas in use 
single process, 3-38 


Index—3 


JOB_CONTROL process, 6-14 Lock Block Log 
example, 3-31 


L Lock contention 
OpenVMS, 3-25, 7-13 
LAN Lock contention page 
See LAN devices, LAN fixes, LAN paths data displayed, 3-26 
displays, 4-1 decoded format, 3-26 
LAN channels, 4-2 raw format, 3-27 
details Lock Contention page 
counters data, 4-24 flags, 3-29 
ECS criteria data, 4—27 formats, 3-26 
errors data, 4-25 Lock ID, 3-29 
overview data, 4-23 lock status, 3-26 
remote system data, 4-26 modes, 3-29 
details data, 4-23 number of locks, 3—26 
fixes, 6-33 resource block address, 3—27 
summary data, 4-13 resource names, 3-26 
LAN devices, 4—2 resource value block dump, 3-27 
data displayed, 4-15 state of lock, 3-29 
detail data, 4-17 Lock queue limit fix, 6-20 
details Locks 
errors data, 4-21 contention for, 3—25 
events data, 4-20 Logical disks 
overview data, 4-17 Windows, 3-24 
receive data, 4-19 Logical names 
transmit data, 4-18 sending messages to OPCOM, 1-11 
fixes, 6-36 Low memory fix, 6-3 
overview data, 4-18 
LAN fixes, 6-25 M 
adjusting device priority, 6-37 
adjusting priority, 6-34 Managed objects 
changing channel hops, 6—35 support for, 4-1 
ECS Maximum Delay, 6-32 Managed objects display, 4-34 
setting maximum packet size, 6-38 Memory, 3-6 
starting device, 6-39 count, 3-14 
stopping device, 6—40 data, 7—20 
VC checksumming, 6-28 default data collection, 7-13 
VC compression, 6-31 low memory fix, 6-3 
VC maximum receive window size, 6-30 OpenVMS summary, 3-13 
VC maximum transmit window size, 6-29 summaries, 3-11 
LAN paths, 4-2 total for a node, 3-6 
LAN virtual circuits, 4-2 Windows summary, 3-11 
detailed data, 4-29 Memory usage 
fixes, 6-28 displaying, 3-13 
PEDRIVER, 4-2 fixes, 6-4 
summary data, 4-11 Menu bar 
LAVC in System Overview window, 2-12 
See NISCA and SCS Messages 
Local area networks (LANs), 1-1 sending to OPCOM, 1-11 
Lock block Modes 
data, 3-28 See CPU modes 
log file, 3-80 Monitoring processes, 7-36 
Lock block log Multicast “Hello” messages, 2-2, 2-4, 2-8, 2-9, 
location of file, 3-30 3-3 
reason for logging, 3-30 Mutexes 
resource name dump, 3-30 held, 3-86 


number in node, 3-5 
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MWAIT state 
resource wait table, A-2 


N 


Network address 
security triplet, 1-8 
Network Interconnect for the System 
Communications Architecture 

See NISCA 
Network protocol, 1-2 
NISCA 

LAN channels, 4-2 

LAN devices, 4-2 

LAN paths, 4-2 

LAN virtual circuits, 4-2 
NISCA transport protocol, 4-29 
Node data 


OpenVMS, 3-3 

summary, 3-6 

Windows, 3-4 
Node pane 


data in OpenVMS display, 3-3 
data in Windows display, 3-4 
Windows, 3-4 
Nodes 
See also Group/Node pane 
adjusting quorum, 6-6 
crashing a node, 6—7 
displaying data, 3-1 
fixes, 6-1, 6-5 
memory usage, 3-13 
OpenVMS, 3-3 
specifying data to collect, 2-10 
summary information, 3-6 
NoEvent data collection interval, 1-14 
Non-managed objects display, 4-33 
Nonpaged pool 
displaying size, 3-12 
NOPROC event, 7-37 
Notifications from Data Collectors, 2-2 


O 


Occurrence counters 
definition, 5-2 
Occurrences 
criterion for posting an event, 1-15 
event, 1-15 
low values, 7-28 
Online help, 2-13 
OPCOM 
signalling events to, 7-23 


OPCOM (Operator Communication Manager) 


sending messages, 1-11 
Open file limit fix, 6-19 


OpenVMS Clusters 
hung, 6-3 
interconnect summary, 4—2 
members data, 4-2, 4-3 
running Availability Manager in, 1-3 
summary data, 4-2, 4-3 


Operator communications manager (OPCOM) 


security log, 1-7 
OS (operating system) version, 3-6 


P 


Packets discarded 
LAN virtual circuit, 4-37 
Page faults, 3-17 
Adjust Working Set fix, 6-15 
Purging Working Set fix, 6-14 
rate, 3-14 
Page files 
data collection, 7-13 
Page/swap files, 3-18 
Paged pool 
displaying size, 3-12 
Pagefile quota fix, 6-24 
Paging write I/O rate, 3-17 
Pane 
Event, 2-12 
Panes 


changing location of column headings, 3-3 


changing sizes, 2-12 

Group, 2-5 

System Overview window, 1-3, 2-5 
Passwords, 1-6 

changing, 1-8, 7-34, 7-35 

default, 1-7 

security triplets, 1-6 
Path lost state, 2-4 
Performance 

identifying problems, 1-11 
Physical disks 

Windows, 3-25 


PID (process identifier), 3-10, 3-14, 3-18 


PIO (paging I/O) rate, 3-18 
Pipe quota 
for transmitted data, 4-32 
PRCFND event, 7-37 
PRCLM process limit, B-—8 
Printing 
display, 2-13 
using Windows Paint program, 2-13 
Private LAN transport security, 1-5 
Process information 
OpenVMS, 3-34 
Process limits fixes, 6-16 
Process memory fixes, 6—2 
Process Priority fix, 6-2, 6-3, 6-4, 6-13 
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Process quotas 


adjusting, 6-3 S 
displaying data, 3-36 
Processes SCA 
adjusting limits, 6-2 LAN virtual circuits detailed data, 4-29 
execution rates, 3-36 summary data, 4-6 
filtering data, 7-15 SCA data 
fixes, 6-1 SCS connections data, 4-8 
information about, 3-34 Screen 
job quotas, 3-38 capturing, 2-13 
looping process fix, 6-3 SCS 
monitoring, 7-36 System Communications Services 
privileges, 1-7 circuits data, 4-2 
Process Priority fix, 6-18 connections data, 4-2 
quotas LAN channel data, 4-2 
displaying, 3-36 LAN path data, 4-2 
Resume Process fix, 6—12 LAN virtual circuit data, 4-2 
runaway process fix, 6-3 LAN virtual circuits detailed data, 4-29 
single process data, 3-32 SCS (System Communications Services) 
Suspend Process fix, 6-11 connections data, 4-8 
wait states, 3-36 Security, 1-6 
working sets, 3-35 access control lists (ACLs), 1-7 
Protocol changing groups, 7-11 
for routers, 1-6 changing passwords, 1-8, 7-34, 7-35 
IEEE 802.3 Extended Packet format, 1-3 data transfer, 1-6 
network, 1-2 file protection, 1-7 
NISCA transport, 4—29 private LAN transport, 1-5 
Purge Working Set fix, 6-14 process privileges, 1-7 
triplets, 1-6, 1-7 
Q Security triplets 
access verification code, 1-8 
Quotas changing, 1-8 
adjusting, 6-3 description, 1-7 
job, 3-38 files, 1-6 
process I/O, 3-36 format, 1-8 
working set, B-—5 hardware address, 1-8 
network address, 1-8 
R operation, 1-10 
password, 1-8 
RADs verifying, 1-10 
maximum number for a node, 3-6 wildcard address, 1-9 
Receiving information, 1-4 Semaphores, 3-5 
Requesting information, 1-4 Serial Number, 3-6 
Resource affinity domains Single disks, 3-21 
see RADs Single process 
Resource availability data, 3-32 
displaying, 5-1 OpenVMS 
fixes, 6-1 execution rates, 3-36 
Restarting the Data Collector, 7—10 job quotas, 3-38 


process I/O quotas, 3-36 
wait states, 3-36 
working set, 3-35 
SMP (symmetric multiprocessing), 3-8 
SNAP 
See 802.3 Extended Packet Format protocol 


Resume Process fix, 6-12 
Runaway process 
Process Priority fix, 6-3 
Suspend Process fix, 6-3 
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Sorting data, 2-12 
Starting the Availability Manager, 2-3 
Status bar 

in System Overview window, 2-13 
Subprocess 

adjusting creation limit, 6—22 
Subprocess creation limit fix, 6-22 
Suspend Process fix, 6-3, 6-11 
Swap files 

data collection, 7-13 
SWAPPER process 

displaying, 3-36 

fixes ignored, 6-4 
SYS$STARTUP directory, 2-3 
SYSAPs (system applications) 

See System applications (SYSAPs) 
System applications (SYSAPs), 4-7 
System cache 

displaying size in use, 3-12 
System Communications Architecture 

See SCA 
System Communications Services 

See SCS 
System Overview window 

components, 2-12 

how to use, 2-4 

menu bar, 2-12 

panes, 1-3 
System service calls 

associated with fixes, 6—4 


T 


Threads, 3-4 
Thresholds 
criteria for posting an event, 1-15 
events, 7-28 
Timer Queue Entry Limit fix, 6-21 
Title bar 
in System Overview window, 2-12 
Tooltip 
example, 3-2 
explanation, 3-1 
TQELM process limit 
raising in UAF file, B-8 


U 


Uptime, 3-6 

User action 
events, 7-28 

User actions 
executing on OpenVMS system, 7-30 
executing on Windows system, 7-31 


V 


Virtual circuits 
LAN 
channel selection data, 4-33, 4-34 
closures data, 4-36 
congestion control data, 4-32 
detailed data, 4-29 
for individual nodes, 4—2 
packets discarded data, 4-37 
receive data, 4-31 
summary data, 4-11 
transmit data packets, 4-30 
Virtual memory 
displaying size, 3-12 
Volume 
default data collection, 7-13 


W 


Wait states 

calculating, 3-37 

CPU, 3-11 

process, 3-36 
Watch Process feature, 7-36 
Wildcard address 

security triplet, 1-9 
Window turn rate, 3-17 
Working set extent, 3-14 
Working sets 

data, 3-35 

pages, 3-35 

purging, 6—14 

size, 3-14 

size fix, 6-15 

too high or too low, 6-3 


Index-7 


